Systems and methods for data processing

ABSTRACT

A method for querying data is provided. The method may include determining a characteristic value of a selected feature dimension among feature values of the selected feature dimension of plurality of entities and establishing a corresponding relationship between the characteristic value and the selected feature dimension. The method may also include caching the corresponding feature value into a cache memory for each entity having a feature value of the selected feature dimension being unequal to the characteristic value, and leaving the corresponding feature value without caching for each entity having a feature value of the selected feature dimension being equal to the characteristic value. The method may further include performing a first search in the cache memory to produce a first search result in response to a query request related to the plurality of entities, and generating a query result based on the corresponding relationship and the first search result.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a Continuation of U.S. application Ser. No.16/703,965, flied on Dec. 5, 2019, which is a Continuation ofInternational Application No. PCT/CN2018/089141, filed on May 31, 2018,which claims priority of Chinese Application No. 201710414760.0, filedon Jun. 5, 2017, the contents of which are expressly incorporated hereinby reference in their entirety.

TECHNICAL FIELD

The present disclosure generally relates to systems and methods for dataprocessing, and in particular, to systems and methods for caching data.

BACKGROUND

In internet technologies, a service system often needs to store andprocess an ever-increasing amount of data, such as feature informationof millions of users. In order to facilitate the speed of data queryingand processing, the service system often utilizes a cache memory tostore information that is frequently used by the processor of theservice system. The cache memory is a high-speed memory that theprocessor can access more quickly than it accesses other regular storagedevices, such as hard disks. However, the size of cache memory is oftenrelatively small compared with regular storage devices. Thus, it isdesirable to develop systems and methods for caching data moreefficiently, thus improving data processing and computer functions.

SUMMARY

In one aspect of the present disclosure, a system for querying data isprovided. The system may include at least one storage medium, a cachememory for storing data, and at least one processor in communicationwith the at least one storage medium and the cache memory. The at leastone storage medium may include a set of instructions and featureinformation of a plurality of entities. The feature information mayinclude at least one feature dimension for each entity and at least onefeature value for each feature dimension. When executing the set ofinstructions, the at least one processor may be configured to direct thesystem to determine a characteristic value of a selected featuredimension among the feature values of the selected feature dimension ofthe plurality of entities and establish a corresponding relationshipbetween the characteristic value and the selected feature dimension. Foreach entity having a feature value of the selected feature dimensionbeing unequal to the characteristic value, the at least one processormay be configured to direct the system to cache the correspondingfeature value of the selected feature dimension into the cache memory.For each entity having a feature value of the selected feature dimensionbeing equal to the characteristic value, the at least one processor maybe configured to direct the system to leave the corresponding featurevalue of the selected feature dimension without caching. In response toa query request related to the plurality of entities, the at least oneprocessor may be further configured to direct the system to perform afirst search in the cache memory to produce a first search result. Theat least one processor may be further configured to direct the system togenerate a query result based on the corresponding relationship and thefirst search result.

In some embodiments, to determine the characteristic value of theselected feature dimension, the at least one processor may be furtherconfigured to direct the system to determine a mode of the featurevalues of the selected feature dimension of plurality of entities as thecharacteristic value of the selected feature dimension.

In some embodiments, the at least one processor may be furtherconfigured to direct the system to update the feature information of theplurality of entities in the at least one storage medium, and determinean updated characteristic value of the selected feature dimension basedon the updated feature information.

In some embodiments, to generate the query result, the at least oneprocessor may be further configured to direct the system to replace oneor more empty returns for the selected feature dimension in the firstsearch result with the characteristic value.

In some embodiments, to generate the query result, the at least oneprocessor may be further configured to direct the system to determinewhether the first search result includes an empty return. In response toa determination that the first search result includes an empty return,the at least one processor may be further configured to direct thesystem to cache the characteristic value of the selected featuredimension into the cache memory based on the corresponding relationshipfor each entity whose selected feature dimension having an empty entry.The at least one processor may be further configured to direct thesystem to perform a second search in the cache memory in response to thequery request to produce a second search result, and generate the queryresult based on the second search result.

In some embodiments, to perform the first search in the cache memory inresponse to the query request, the at least one processor may be furtherconfigured to direct the system to determine whether the query requestis related to the selected feature dimension and the correspondingcharacteristic value. In response to a determination that the queryrequest is related to the selected feature dimension and thecorresponding characteristic value, the at least one processor may befurther configured to direct the system to update the query request. Theupdated query request may include the feature dimension and an emptyentry. The at least one processor may be further configured to directthe system to perform the first search in the cache memory based on theupdated query request.

In some embodiments, to perform the first search in the cache memory inresponse to the query request, the at least one processor may be furtherconfigured to direct the system to determine whether the query requestis related to the selected feature dimension. In response to adetermination that the query request is related to the selected featuredimension, the at least one processor may be further configured todirect the system to cache the characteristic value of the selectedfeature dimension into the cache memory for each entity whose selectedfeature dimension has an empty entry. The at least one processor may befurther configured to direct the system to perform the first search inthe cache memory in response to the query request.

In some embodiments, to perform the first search in the cache memory inresponse to the query request, the at least one processor may be furtherconfigured to direct the system to cache the characteristic value of theselected feature dimension into the cache memory for each entity whoseselected feature dimension has an empty entry and perform the firstsearch in the cache memory in response to the query request.

In some embodiments, the plurality of entities may include at least oneof service requesters, service providers, or service orders in an Onlineto Offline (020) service system.

In another aspect of the present disclosure, a method is provided. Themethod may be implemented on a computing device having at least oneprocessor, at least one storage medium, a cache memory, and acommunication platform connected to a network. The at least one storagemedium may include feature information of a plurality of entities. Thefeature information may include at least one feature dimension for eachentity and at least one feature value for each feature dimension. Themethod may include determining a characteristic value of a selectedfeature dimension among the feature values of the selected featuredimension of the plurality of entities and establishing a correspondingrelationship between the characteristic value and the selected featuredimension. The method may also include caching the corresponding featurevalue of the selected feature dimension into the cache memory for eachentity having a feature value of the selected feature dimension beingunequal to the characteristic value. The method may further includeleaving the corresponding feature value of the selected featuredimension without caching for each entity having a feature value of theselected feature dimension being equal to the characteristic value. Themethod may further include performing a first search in the cache memoryto produce a first search result in response to a query request relatedto the plurality of entities. The method may further include generatinga query result based on the corresponding relationship and the firstsearch result.

In another aspect of the present disclosure, a non-transitorycomputer-readable storage medium embodying a computer program product isprovided. The computer program product comprising instructions may beconfigured to cause a computing device to determine a characteristicvalue of a selected feature dimension among a plurality of featurevalues of the selected feature dimension of a plurality of entities andestablish a corresponding relationship between the characteristic valueand the selected feature dimension. The computer program productcomprising instructions may be further configured to cause the computingdevice to cache the corresponding feature value of the selected featuredimension into a cache memory for each entity having a feature value ofthe selected feature dimension being unequal to the characteristicvalue. The computer program product comprising instructions may befurther configured to cause the computing device to leave thecorresponding feature value of the selected feature dimension withoutcaching for each entity having a feature value of the selected featuredimension being equal to the characteristic value. The computer programproduct comprising instructions may be further configured to cause thecomputing device to perform a first search in the cache memory toproduce a first search result in response to a query request related tothe plurality of entities. The computer program product comprisinginstructions may be further configured to cause the computing device togenerate a query result based on the corresponding relationship and thefirst search result

Additional features will be set forth in part in the description whichfollows, and in part will become apparent to those skilled in the artupon examination of the following and the accompanying drawings or maybe learned by production or operation of the examples. The features ofthe present disclosure may be realized and attained by practice or useof various aspects of the methodologies, instrumentalities andcombinations set forth in the detailed examples discussed below.

BRIEF DESCRIPTION OF THE DRAWINGS

The present disclosure is further described in terms of exemplaryembodiments. These exemplary embodiments are described in detail withreference to the drawings. These embodiments are non-limiting exemplaryembodiments, in which like reference numerals represent similarstructures throughout the several views of the drawings, and wherein:

FIG. 1 is a flowchart illustrating an exemplary process for dataprocessing according to some embodiments of the present disclosure;

FIG. 2 is a flowchart illustrating an exemplary process for dataprocessing according to some embodiments of the present disclosure;

FIG. 3 is a flowchart illustrating an exemplary process for dataprocessing according to some embodiments of the present disclosure;

FIG. 4 is a flowchart illustrating an exemplary process for dataprocessing according to some embodiments of the present disclosure;

FIG. 5 is a flowchart illustrating an exemplary process for dataprocessing according to some embodiments of the present disclosure;

FIG. 6 is a block diagram illustrating an exemplary data processingdevice according to some embodiments of the present disclosure;

FIG. 7 is a block diagram illustrating an exemplary data processingdevice according to some embodiments of the present disclosure;

FIG. 8 is a block diagram illustrating an exemplary data processingdevice according to some embodiments of the present disclosure;

FIG. 9 is a block diagram illustrating an exemplary data processingdevice according to some embodiments of the present disclosure;

FIG. 10 is a schematic diagram of an exemplary mobile device accordingto some embodiments of the present disclosure;

FIG. 11 is a schematic diagram illustrating an exemplary data processingsystem according to some embodiments of the present disclosure;

FIG. 12 is a schematic diagram illustrating exemplary hardware and/orsoftware components of a computing device according to some embodimentsof the present disclosure; and

FIG. 13 is a schematic diagram illustrating exemplary hardware and/orsoftware components of a mobile device on which a terminal may beimplemented according to some embodiments of the present disclosure.

DETAILED DESCRIPTION

In the following detailed description, numerous specific details are setforth by way of examples in order to provide a thorough understanding ofthe relevant disclosure. However, it should be apparent to those skilledin the art that the present disclosure may be practiced without suchdetails. In other instances, well-known methods, procedures, systems,components, and/or circuitry have been described at a relativelyhigh-level, without detail, in order to avoid unnecessarily obscuringaspects of the present disclosure. Various modifications to thedisclosed embodiments will be readily apparent to those skilled in theart, and the general principles defined herein may be applied to otherembodiments and applications without departing from the spirit and scopeof the present disclosure. Thus, the present disclosure is not limitedto the embodiments shown, but to be accorded the widest scope consistentwith the claims.

The terminology used herein is for the purpose of describing particularexample embodiments only and is not intended to be limiting. As usedherein, the singular forms “a,” “an,” and “the” may be intended toinclude the plural forms as well, unless the context clearly indicatesotherwise. It will be further understood that the terms “comprise,”“comprises,” and/or “comprising,” “include,” “includes,” and/or“including,” when used in this specification, specify the presence ofstated features, integers, steps, operations, elements, and/orcomponents, but do not preclude the presence or addition of one or moreother features, integers, steps, operations, elements, components,and/or groups thereof.

It will be understood that the term “system,” “engine,” “unit,”“module,” and/or “block” used herein are one method to distinguishdifferent components, elements, parts, section or assembly of differentlevel in ascending order. However, the terms may be displaced by anotherexpression if they achieve the same purpose.

Generally, the word “module,” “unit,” or “block,” as used herein, refersto logic embodied in hardware or firmware, or to a collection ofsoftware instructions. A module, a unit, or a block described herein maybe implemented as software and/or hardware and may be stored in any typeof non-transitory computer-readable medium or another storage device. Insome embodiments, a software module/unit/block may be compiled andlinked into an executable program. It will be appreciated that softwaremodules can be callable from other modules/units/blocks or themselves,and/or may be invoked in response to detected events or interrupts.Software modules/units/blocks configured for execution on computingdevices may be provided on a computer-readable medium, such as a compactdisc, a digital video disc, a flash drive, a magnetic disc, or any othertangible medium, or as a digital download (and can be originally storedin a compressed or installable format that needs installation,decompression, or decryption prior to execution). Such software code maybe stored, partially or fully, on a storage device of the executingcomputing device, for execution by the computing device. Softwareinstructions may be embedded in firmware, such as an erasableprogrammable read-only memory (EPROM). It will be further appreciatedthat hardware modules/units/blocks may be included in connected logiccomponents, such as gates and flip-flops, and/or can be included ofprogrammable units, such as programmable gate arrays or processors. Themodules/units/blocks or computing device functionality described hereinmay be implemented as software modules/units/blocks, but may berepresented in hardware or firmware. In general, themodules/units/blocks described herein refer to logicalmodules/units/blocks that may be combined with othermodules/units/blocks or divided into sub-modules/sub-units/sub-blocksdespite their physical organization or storage. The description may beapplicable to a system, an engine, or a portion thereof.

It will be understood that when a unit, engine, module or block isreferred to as being “on,” “connected to,” or “coupled to,” anotherunit, engine, module, or block, it may be directly on, connected orcoupled to, or communicate with the other unit, engine, module, orblock, or an intervening unit, engine, module, or block may be present,unless the context clearly indicates otherwise. As used herein, the term“and/or” includes any and all combinations of one or more of theassociated listed items.

These and other features, and characteristics of the present disclosure,as well as the methods of operation and functions of the relatedelements of structure and the combination of parts and economies ofmanufacture, may become more apparent upon consideration of thefollowing description with reference to the accompanying drawings, allof which form a part of this disclosure. It is to be expresslyunderstood, however, that the drawings are for the purpose ofillustration and description only and are not intended to limit thescope of the present disclosure. It is understood that the drawings arenot to scale.

The flowcharts used in the present disclosure illustrate operations thatsystems implement according to some embodiments in the presentdisclosure. It is to be expressly understood, the operations of theflowchart may be implemented not in order. Conversely, the operationsmay be implemented in inverted order, or simultaneously. Moreover, oneor more other operations may be added to the flowcharts. One or moreoperations may be removed from the flowcharts.

Embodiments of the present disclosure may be applied to differenttransportation systems including but not limited to land transportation,sea transportation, air transportation, space transportation, or thelike, or any combination thereof. A vehicle of the transportationsystems may include a rickshaw, travel tool, taxi, chauffeured car,hitch, bus, rail transportation (e.g., a train, a bullet train,high-speed rail, and subway), ship, airplane, spaceship, hot-airballoon, driverless vehicle, or the like, or any combination thereof.The transportation system may also include any transportation systemthat applies management and/or distribution, for example, a system forsending and/or receiving an express.

The application scenarios of different embodiments of the presentdisclosure may include but not limited to one or more web pages, browserplugins and/or extensions, client terminals, custom systems,intracompany analysis systems, artificial intelligence robots, or thelike, or any combination thereof. It should be understood thatapplication scenarios of the system and method disclosed herein are onlysome examples or embodiments. Those having ordinary skills in the art,without further creative efforts, may apply these drawings to otherapplication scenarios.

It should be understood that, although items, such as “first” “second”and “third” may be used to describe various kinds of information in thepresent application, the information may be described by any other term.The terms are only used to distinguish different information from eachother. For example, first information may also be referred to as secondinformation without departing from the scope of the present application.Similarly, second information may also be referred to as firstinformation. The term “if” may refer to “when” or “in response to adetermination”.

The present disclosure relates to systems and methods for dataprocessing. The systems may include a storage medium, a cache memory,and a processor. The storage medium may store feature information of aplurality of entities. The feature information may include featurevalues of one or more feature dimensions of each entity. The cachememory may be a high-speed memory that the processor can access morequickly than it accesses the storage medium. In order to better utilizethe cache memory, the processor may select a portion of the featureinformation and cache the selected portion into the cache memory insteadof caching all the feature information into the cache memory. Forexample, the processor may determine a characteristic value of aselected feature dimension among the feature values of the selectedfeature dimension of the entities. For each entity whose feature valueof the selected feature dimension is unequal to the characteristicvalue, the processor may cache the corresponding feature value of theselected feature dimension into the cache memory. For each entity whosefeature value of the selected feature dimension is equal to thecharacteristic value, the processor may leave the corresponding featurevalue of the selected feature dimension without caching. When receivinga query request, the processor may perform a search in the cache memoryand generate a query result based on the search result and acorresponding relationship between the selected feature dimension andthe characteristic value. In this way, the amount of data cached intothe cache memory may be reduced without losing information.

Some embodiments of the present disclosure will be described below indetail with reference to the following drawings. The embodiments andfeatures in the embodiments described below may be combined with eachother.

FIG. 1 is a flowchart illustrating an exemplary process for dataprocessing according to some embodiments of the present disclosure. Insome embodiments, the process 100 may be implemented on an electronicdevice (e.g., a smartphone, a tablet, a personal computer). The process100 may be applied to a data set including at least one featuredimension.

In some embodiments, one or more operations of process 100 may beexecuted by the data processing system 1100 as illustrated in FIG. 11.For example, one or more operations of the process 100 may be stored ina storage device (e.g., a storage device 1140, the ROM 1230, the RAM1240, the storage 1390) as a set of instructions. In some embodiments,the server 1110 (e.g., the processing engine 1112 in the server 1110,the processor 1220 of the processing engine 1112), the terminal 1130, adata processing device (e.g., any one of devices 600 to 900) may executethe set of instructions. For illustration purposes, the implementationof the process 100 by the processing engine 1112 is described as anexample.

In 101, for each feature dimension i of an original data set, theprocessing engine 1112 may select a characteristic value Mi from aplurality of feature values of the feature dimension i. The processingengine 1112 may also record a corresponding relationship between eachfeature dimension i and the corresponding characteristic value Mi. Insome embodiments, the original data set may include at least one featuredimension.

The original data set may be a data source for data processing, andinclude at least one feature dimension. The original data set mayinclude, for example, user data of an Internet platform. The Internetplatform may be a car hailing application, a website for usertransactions, a website for user communication, etc. The user data mayinclude at least one dimensional of feature. Each dimensional of featuremay be represented as a feature dimension. Each feature dimension mayinclude at least one feature value. The feature value(s) of a featuredimension may be discrete or continuous.

In some embodiments, the original data set may include featureinformation of a plurality of entities. The feature information mayinclude at least one feature dimension for each entity and at least onefeature value for each feature dimension. In some embodiments, an entitymay refer to something having a real existence, as a subject or as anobject, currently or potentially, concretely or abstractly, physicallyor virtually. For example, the plurality of entities may include atleast one of service requesters, service providers, or service orders inan online to offline (020) service system. In some embodiments, theoriginal data set may be stored in a storage device (e.g., a storagedevice 1140, a ROM 1230, a storage 1390) of the data processing system1000.

In some embodiments, the feature dimension i may also be referred to asa selected feature dimension of the at least one feature dimension ofthe entities. In some embodiments, the selected feature dimension may beselected from the at least one feature dimension randomly or accordingto a selection rule by the processing engine 1112. Additionally oralternatively, the selected feature dimension may be selected by a usermanually via a terminal 1130. In certain embodiments, a portion or allof at the least one feature dimension of the entities may be selected asselected feature dimension(s). The processing engine 1112 may selectand/or determine a characteristic value for each selected featuredimension.

In some embodiments, the original data set may be a data set related toa plurality of users of a car hailing application. The original data setmay include three feature dimensions, such as the age, the gender, andthe number of orders in the last 30 days of the users as shown inTable 1. As shown in Table 1, the user ID may be a user numberregistered by a user in the car hailing application, which is used toidentify the user. The age of users may be represented by intervalnumbers. For example, “70s” represents that the user was born between1970 and 1979, “80s” represents that the user was born between 1980 and1989, and “90s” represents that the user was born between 1990 and 1999.

TABLE 1 of User Information The Number of Orders User ID Age Gender inthe Last 30 Days Y001 80s Female 0 Y002 80s Female 0 Y003 80s Male 5Y004 90s Female 0 Y005 70s Male 0

In the above mentioned example illustrated in Table 1, the featuredimension “age” may include feature values of “70s” “80s” and “90s”. Thefeature dimension “gender” may include feature values of “female” and“male”. The feature dimension “the number of orders in the last 30 days”may include feature values of “0” and “5”.

In some embodiments, the characteristic value Mi of the featuredimension i may be selected according to a preset rule. For example, thecharacteristic value Mi may be selected based on a statisticaldistribution of the feature values of the feature dimension i. It shouldbe noted that the characteristic value Mi may be selected by any othermeans. In some embodiments, a feature value of the feature dimension iwith any distribution proportion may be designated as the characteristicvalue Mi. For example, any one of “70s”, “80s”, and “90s” may beselected as the characteristic value Mi of the feature dimension “age”.In order to improve the cache efficiency, a feature value having thelargest distribution proportion may be selected as the characteristicvalue Mi. It should be understood that any feature value can be selectedas the characteristic value Mi to improve the cache efficiency.

As used herein, a distribution proportion of a feature value withrespect to a feature dimension refers to a proportion of entities (e.g.,users) having the feature value with respect to the feature dimensionamong all the entities. A feature value having the largest distributionproportion in a feature dimension may also be referred to as a modeamong the feature values of the feature dimension. For example,according to Table 1, the distribution proportion of “70s”, “80s”, and“90s” with respect to the feature dimension “age” may be 20%, 60% and20%, respectively. The feature value “80s” may be the mode among thefeature values of the feature dimension “age” and have the largestdistribution proportion, which may be designated as the characteristicvalue of the feature dimension “age”.

The processing engine 1112 may establish a corresponding relationshipbetween each feature dimension i and the corresponding characteristicvalue Mi to record the corresponding relationship. In some embodiments,the corresponding relationship may be recorded in a mapping table. Insome embodiments, different corresponding relationships of differentfeature dimensions i may be recorded separately in different mappingtables. Additionally or alternatively, different correspondingrelationships of different feature dimensions i may be jointly recordedin one mapping table. The mapping table(s) may be stored in a storagedevice of the data processing system 1100 (e.g., the storage device1140, the cache memory 1150, the ROM 1230, the RAM 1240, the storage1390).

For example, if the selected characteristic value Mi of the featuredimension “age” is “70s”, a mapping table of age may be established asshown in Table 2. If the selected characteristic value Mi of the featuredimension “gender” is “male”, a mapping table of gender may beestablished as shown in Table 3. If the selected characteristic value Miof the feature dimension “the number of orders in the last 30 days” is“0”, a mapping table of the number of orders in the last 30 days may beestablished as shown in Table 4. It should be noted that the mappingtables shown in Tables 2, 3 and 4 may be recorded and/or stored jointlyin one mapping table.

TABLE 2 Mapping Table of Age Age 70s

TABLE 3 Mapping Table of Gender Gender Male

TABLE 4 Mapping Table of the Number of Orders in the Last 30 Days TheNumber of Orders in the Last 30 Days 0

In 102, for each feature dimension i of the original data set, theprocessing engine 1112 may cache feature value(s) of the featuredimension i other than the corresponding characteristic value Mi into acache memory.

In some embodiments, if the feature value of the feature dimension i ofan entity is unequal to the characteristic value Mi, the processingengine 1112 may cache the corresponding feature value of the featuredimension i of the entity into the cache memory. On the other hand, ifthe feature value of the feature dimension i of an entity is equal tothe characteristic value Mi, the processing engine 1112 may leave thecorresponding feature value of the feature dimension i without caching.

For example, the characteristic values Mi of the feature dimensions“age”, “gender”, and “the number of orders in the last 30 days” may be“70s”, “male”, and “0”, respectively. The data cached into the cachememory may be shown as Table 5, in which “NULL” represents that thecorresponding feature value(s) are null in the cache memory. It shouldbe noted that the cached data illustrated in Table 5 may be determinedaccording to the characteristic values Mi selected in operation 101.This is not intended to be limiting, and the cached data may not belimited to the example illustrated in Table 5.

TABLE 5 of User Information Cached into the Cache Memory The Number ofOrders User ID Age Gender in the Last 30 Days Y001 80s Female NULL Y00280s Female NULL Y003 80s NULL 5 Y004 90s Female NULL Y005 NULL NULL NULL

In 103, the processing engine 1112 may perform a search (also referredto as a first search) in the cache memory in response to a queryrequest.

The query request may include a single query condition or a compoundquery condition. The single query condition may only include one querycondition, while the compound query condition may include at least twoquery conditions. For example, a query request including a single querycondition may be a request to search data (e.g., the age, the gender,and the number of orders in the last 30 days) of a user whose ID is“Y001”, or a request to search data of all users whose age is “80s”. Aquery request including a compound query condition may be a request tosearch data of all users whose age is “80s” or “90s”, or a request tosearch data of all users whose age is “80s” and gender is “female”.

Upon receiving the query request, the processing engine 1112 may performthe search in the cache memory according to the query request. In someembodiments, if the returned value(s) of the search based on the cacheddata is not null (that is, the search result of the search does notinclude one or more empty returns), the processing engine 1112 maygenerate a query result based on the returned value(s). Merely by way ofexample, if the query condition of the query request is to search theage of a user whose ID is “Y001”, the corresponding query resultgenerated based on the returned value(s) may be “80s”. If the querycondition of the query request is to search the gender of a user whoseID is “Y002”, the corresponding query result generated based on thereturned value(s) may be “female”. If the query condition of the queryrequest is to search the number of orders in the last 30 days of a userwhose ID is “Y003”, the corresponding query result generated based onthe returned value(s) may be “5”. If the query condition of the queryrequest is to search users whose age is “80s” and gender is “female”,the corresponding query result generated based on the returned value(s)may be “Y001 and Y002”. In some embodiments, if one or more returnedvalues of the search based on the cached data are null (that is, thesearch result includes one or more empty returns), the processing engine1112 may proceed to operation 104.

In 104, the processing engine 1112 may replace one or more featurevalues that are null in the cache memory by the correspondingcharacteristic value(s) Mi based on the query request and the recordedcorresponding relationship(s). The processing engine 1112 may thengenerate the query result based on the replacement result.

When a returned value of the search based on the cached data is null (orbeing referred to as “empty”), the processing engine 1112 may extract afeature dimension corresponding to the query condition according to thequery request. The processing engine 1112 may replace the null value(s)of the extracted feature dimension in the cache memory by thecorresponding characteristic value Mi based on the recordedcorresponding relationship of the extracted feature dimension and itscharacteristic value Mi. The processing engine 1112 may then generatethe query result according to the replacement result.

For example, if the query condition of the query request is to searchthe gender of a user whose ID is “Y003”, the corresponding feature valuemay be null in the cache memory, which may result in a null returnvalue. The processing engine 1112 may extract the feature dimensioncorresponding to the query condition based on the query request, thatis, the feature dimension “gender”. According to the pre-stored mappingtable of gender (e.g., Table 3), the characteristic value of the featuredimension “gender” may be “male”. The processing engine 1112 may replacethe null feature value(s) of the feature dimension “gender” in the cachememory by “male”. In that cases, the replacement result of the nullfeature value(s) of the feature dimension “gender” may be “male”. Theprocessing engine 1112 may then generate the query result based on thereplacement result. For example, the query result may be that “thegender of the user whose ID is “Y003” is male”.

As another example, if the query condition of the query request is tosearch the age and the number of orders in the last 30 days of a userwhose ID is “Y001”, two feature values A and B may be involved. Thefeature value A may be “80s”, which is the age of the user whose ID is“Y001”. The feature value B may be null in the cache memory, which isthe number of orders in the last 30 days of the user whose ID is “Y001”.According to the feature value A, the processing engine 1112 maygenerate a query result, e.g., “the age of the user whose ID is “Y001”is 80s”. According to the feature value B, the processing engine 1112may extract a feature dimension corresponding to the query conditionbased on the query request, that is, the feature dimension “the numberof orders in the last 30 days”. Based on the pre-stored mapping table ofthe number of orders in the last 30 days (e.g., Table 4), the processingengine 1112 may replace the null feature value(s) of the featuredimension “the number of orders in the last 30 days” in the cache memoryby “0”. In that case, the replacement result of the null featurevalue(s) of the feature dimension “the number of orders in the last 30days” may be “0”. The processing engine 1112 may generate the queryresult based on the replacement result. For example, the query resultmay be that “the number of orders in the last 30 days of the user whoseID is “Y001” is 0”.

In the above mentioned example as shown in Table 5, if there are 5users, the cached user information may be 7 fields less than theoriginal data set. The mapping table of feature dimensions andcharacteristic values may need 6 fields for storing comprehensiveness ofinformation of users. Thus, according to the data processing processdisclosed in the present disclosure, the cached user information and themapping table may need 1 field in the cache memory less than theoriginal data set. Compared with caching the original data set, the dataprocessing process disclosed in the present disclosure may save a largenumber of fields when there are a great number of users (e.g., 200million) and hundreds of feature dimensions. In this way, the totalamount of cached data may be compressed, the cache efficiency may beimproved, and the caching cost may be reduced without losing the totalamount of information. The saved cache space may be used for cachingdata of more feature dimensions when the cache capacity is limited,which may improve the integrity of data.

In some embodiments, the data processing process described above may beimplemented on an online service system. The online service system maybe configured with a cache cluster and an access server. User featuredata may be cached in a memory of the cache cluster, which can increasethe speed of accessing data. The corresponding relationship of a featuredimension and its characteristic value may be stored locally on theaccess server, or in another storage device of the online servicesystem. The access server may access the corresponding relationship whena feature value is null in the cache memory. In some embodiments, whenthe online service system adopts a machine learning or deep learningtechnique, the online service system may use the data processing processin machine learning or deep learning models, taking advantages ofartificial intelligence.

It should be noted that the present disclosure takes discrete featurevalues of each feature dimension as an example for description. In someembodiments, the feature values of a feature dimension may becontinuous. In addition, in the above exemplary embodiments, the dataprocessing process of the present disclosure is implemented on allfeature dimensions. In some embodiments, the process may be implementedon a portion of the feature dimensions.

In some embodiments, the processing engine 1112 may divide the featurevalues of a feature dimension i into one or more intervals. Theprocessing engine 1112 may further designate one of the interval(s) as acharacteristic interval. For the feature value(s) of the featuredimension not within the characteristic interval, the processing engine1112 may cache the feature value(s) into the cache memory. For thefeature value(s) of the feature dimension within the characteristicinterval, the processing engine 1112 may leave the feature value(s)without caching. Merely by way of example, the processing engine 1112may divide the feature values of the total number of orders into aplurality of intervals, such as, 0-5, 6-10, 11-15, and more than 15,among which 0-5 is designated as a characteristic interval correspondingto the total number of orders. The feature value(s) of the total numberof orders not within 0-5 may be cached into the cache memory while thefeature value(s) of the total number of orders within 0-5 may be leftwithout caching.

The technical solutions disclosed in the present disclosure may includethe following beneficial effects.

For each feature dimension i in the original data set, the processingengine 1112 may select a characteristic value Mi from a plurality offeature values of the feature dimension i, and record a correspondingrelationship between the feature dimension i and Mi. The processingengine 1112 may cache feature values of the feature dimension i otherthan the characteristic value Mi for each feature dimension i of theoriginal data set into a cache memory. In this case, the amount ofcached data may be reduced. The processing engine 1112 may perform asearch in the cache memory in response to a query request. If returnedvalue(s) of the search based on the cached data is not null, theprocessing engine 1112 may generate a query result according to thereturned value(s). On the other hand, if the returned value(s) of thesearch based on the cached data includes a null value, the processingengine 1112 may replace one or more feature values that are null in thecache memory by the corresponding characteristic value(s) Mi based onthe query request and the recorded corresponding relationship. Theprocessing engine 1112 may generate a query result based on thereplacement result. In this case, the total amount of cached data may becompressed. The cache efficiency may be improved, and the caching costcan be reduced without losing the total amount of information. The savedcache space may be used for caching data of more feature dimensions whenthe cache capacity is limited, which may improve the integrity of data.

FIG. 2 is a flowchart illustrating an exemplary process for dataprocessing according to some embodiments of the present disclosure. Insome embodiments, a feature value with the largest distributionproportion of a feature dimension i may be determined as acharacteristic value Mi of the feature dimension i. The data processingprocess 200 (also referred to as process 200) of the present disclosuremay greatly improve the cache efficiency, save the memory space of thecache memory, and reduce the caching cost.

In some embodiments, the process 200 may be an embodiment of the process100 as described in connection with FIG. 1. In some embodiments, one ormore operations of process 200 may be executed by the data processingsystem 1100 as illustrated in FIG. 11. For example, one or moreoperations of the process 200 may be stored in a storage device (e.g.,the storage device 1140, the ROM 1230, the RAM 1240, the storage 1390)as a set of instructions. In some embodiments, the server 1110 (e.g.,the processing engine 1112 in the server 1110, the processor 1220 of theprocessing engine 1112 in the server 1110, the terminal 1130, a dataprocessing device (e.g., any one of devices 600 to 900) may execute theset of instructions. For illustration purposes, the implementation ofthe process 200 by the processing engine 1112 is described as anexample.

In 201, for each feature dimension i of an original data set, theprocessing engine 1112 may determine a feature value with the largestdistribution proportion among a plurality of feature values of thefeature dimension i as a characteristic value Mi. The processing engine1112 may record a corresponding relationship between each featuredimension i and the corresponding characteristic value Mi.

In some embodiments, the original data set may be shown in Table 1. Theprocessing engine 1112 may determine a feature value having the largestdistribution proportion of each feature dimension i as thecharacteristic value Mi of the feature dimension i based on astatistical distribution of the feature values of the feature dimensioni. For example, for the feature dimension “age”, users whose age is“80s” have the largest distribution proportion. Thus, “80s” may bedetermined as the characteristic value Mi of the feature dimension“age”. Similarly, “female” may be determined as the characteristic valueMi of the feature dimension “gender”. “0” may be determined as thecharacteristic value Mi of the feature dimension “the number of ordersin the last 30 days”. A corresponding relationship between the featuredimension “age” and its corresponding characteristic value Mi (alsoreferred to as a mapping table of age) may be established as shown inTable 6. A corresponding relationship between the feature dimension“gender” and its corresponding characteristic value Mi (also referred toas a mapping table of gender) may be established as shown in Table 7. Acorresponding relationship between the feature dimension “the number oforders in the last 30 days” and its corresponding characteristic valueMi (also referred to as a mapping table of the number of orders in thelast 30 days) may be established as shown in Table 4 described in FIG.1.

TABLE 6 Mapping Table of Age Age 80s

TABLE 7 Mapping Table of Gender Gender Female

In 202, for each feature dimension i of the original data set, theprocessing engine 1112 may cache feature value(s) of the featuredimension i other than the corresponding characteristic value Mi into acache memory. Operation 202 may be performed in a similar manner withoperation 102, and the descriptions thereof are not repeated here.

For example, according to the characteristic value(s) Mi determined inoperation 201, the data cached into the cache memory may be shown asTable 8.

TABLE 8 of User Information Cached into the Cache Memory The Number ofOrders User ID Age Gender in the Last 30 days Y001 NULL NULL NULL Y002NULL NULL NULL Y003 NULL Male 5 Y004 90s NULL NULL Y005 70s Male NULL

In 203, the processing engine 1112 may perform a search in the cachememory in response to a query request.

In 204, the processing engine 1112 may replace one or more featurevalues that are null in the cache memory by the correspondingcharacteristic value(s) Mi based on the query request and the recordedcorresponding relationship(s). The processing engine 1112 may generate aquery result based on the replacement result.

Operations 203 and 204 may be performed in a similar manner withoperations 103 and 104, respectively, and the descriptions thereof arenot repeated here.

In the above mentioned example as shown in Table 8, if there are 5users, the cached user information may be 10 fields less than theoriginal data set. The mapping table of feature dimensions andcharacteristic values may need 6 fields for storing comprehensiveness ofinformation of users. Thus, according to the data processing processdisclosed in the present disclosure, the cached user information and themapping table may need 4 fields in the cache memory less than theoriginal data set. Compared with caching the original data set, the dataprocessing process disclosed in the present disclosure may save a largenumber of fields when there are a great number of users (e.g., 200million) and hundreds of feature dimensions. In this way, the totalamount of cached data may be greatly compressed, the cache efficiencymay be greatly improved, and the caching cost may be greatly reducedwithout losing the total amount of information. The saved cache spacemay be used for caching data of more feature dimensions when the cachecapacity is limited, which may improve the integrity of data.

In some embodiments, the feature value having the largest distributionproportion of the feature dimension i may be determined as thecharacteristic value Mi of the feature dimension i. This may greatlycompress the total amount of cached data, improve the cache efficiency,and reduce the caching cost without a loss of the total amount ofinformation. The saved cache space may be used for caching data of morefeature dimensions when the cache capacity is limited, which may improvethe integrity of data.

It should be noted that the original data set may need to be updated dueto actual conditions, such as registrations of new users, massive dataupdating, policy changes of the platform, etc. Therefore, thecorresponding relationship between a feature dimension and itscharacteristic value may need to be updated to ensure the accuracy ofthe query result. The update of the original data set and/or thecorresponding relationship may be caused by various factors, such as butnot limited to a manual input (e.g., an instruction of a user receivedfrom a terminal 1130), a time condition (e.g., a requirement ofperiodical update or real-time update), a certain event (e.g., a massivedata updating, registrations of new users), etc.

In 205, the processing engine 1112 may update the original data set. Theupdated original data set may be regarded as a new data source on whichthe data processing process disclosed in the present disclosure isimplemented. In some embodiments, the updated original data set mayinclude feature information of one or more new entities. Additionally oralternatively, the updated original data set may include updated featureinformation of the original entities, such as but not limited to updatedfeature values of the original feature dimension(s), feature informationof one or more new feature dimensions.

In 206, the processing engine 1112 may update a distribution proportionof each feature value of the feature dimension i based on the updatedoriginal data set.

Merely by way of example, “female” is a feature value that has thelargest distribution proportion in the feature dimension “gender”. Afterthe original data set is updated, “male” becomes a feature value thathas the largest distribution proportion in the feature dimension“gender”.

In 207, the processing engine 1112 may update the characteristic valueMi based on the updated distribution proportion(s) of feature value(s)in the feature dimension i.

In the above mentioned example, the characteristic value Mi of thefeature dimension “gender” may change from “female” to “male”.

In 208, the processing engine 1112 may update the correspondingrelationship between i and Mi based on the updated characteristic valueMi.

In the above mentioned example, the corresponding relationship betweenthe feature dimension “gender” and its characteristic value Mi may beupdated. For example, the mapping table of gender as shown in Table 7may be updated to the mapping table of gender as shown in Table 3.

In some embodiments, after operations 205 to 208, the processing engine1112 may proceed to operations 202 to 204. In 202, for each featuredimension i in the updated original data set, the processing engine 1112may cache feature value(s) other than the corresponding updatedcharacteristic value Mi into the cache memory.

In some embodiments, after 204, the processing engine 1112 may determinewhether a condition of data updating is satisfied. In response to adetermination that the condition of data updating is satisfied, theprocessing engine 1112 may proceed to operation 205. The condition ofdata updating may be related to, for example but not limited to thechange of the number of entities in the original data set, aninstruction to update data received from a user, the interval betweenthe current time and the last data updating, or the like, or anycombination thereof. For example, the processing engine 1112 maydetermine whether the number of the entities in the original data setchanges over a threshold. If the processing engine 1112 determines thatthe condition of data updating is satisfied, it may proceed to operation205 to update the original data set.

In some embodiments, the processing engine 1112 may perform the process200 on each feature dimension of the entities in the original data set.Alternatively, the processing engine 1112 may perform the process 200 ona portion of the feature dimension(s) of the entities in the originaldata set.

In the present disclosure, the feature value having the largestdistribution proportion of a feature dimension i may be determined asthe characteristic value Mi of the feature dimension i, which maygreatly compress the total amount of cached data, improve the cacheefficiency, and reduce the caching cost without losing the total amountof information. Besides, the cached data and the correspondingrelationship(s) between feature dimension(s) and the correspondingcharacteristic value(s) may be updated based on the updated originaldata set, which may ensure the accuracy of the query result.

FIG. 3 is a flowchart illustrating an exemplary process for dataprocessing according to some embodiments of the present disclosure.

In some embodiments, process 300 may be an embodiment of the process 100as described in connection with FIG. 1. In some embodiments, one or moreoperations of process 300 may be executed by the data processing system1100 as illustrated in FIG. 11. For example, one or more operations ofthe process 300 may be stored in a storage device (e.g., the storagedevice 1140, the ROM 1230, the RAM 1240, the storage 1390) as a set ofinstructions. In some embodiments, the server 1110 (e.g., the processingengine 1112 in the server 1110, the processor 1220 of the processingengine 1112 in the server 1110, the terminal 1130, a data processingdevice (e.g., any one of devices 600 to 900) may execute the set ofinstructions. For illustration purposes, the implementation of theprocess 300 by the processing engine 1112 is described as an example.

In 301, for each feature dimension i of an original data set, theprocessing engine 1112 may select a characteristic value Mi from aplurality of feature values of the feature dimension i. The processingengine 1112 may record a corresponding relationship between each featuredimension i and the corresponding characteristic value Mi.

In 302, for each feature dimension i of the original data set, theprocessing engine 1112 may cache feature value(s) of the featuredimension i other than the corresponding characteristic value Mi into acache memory.

Operations 301 and 302 may be performed in a similar manner withoperations 101 and 102, respectively, and the descriptions thereof arenot repeated here.

In some embodiments, the original data set may be shown in Table 1. Thecorresponding relationships between feature dimensions and thecorresponding characteristic values may be shown in Tables 4, 6, and 7.The cached data may be shown in Table. 8.

In 303, in response to a query request, the processing engine 1112 maydetermine whether the query request includes a query condition thatincludes a feature dimension i and its corresponding characteristicvalue Mi based on the recorded corresponding relationship(s). Inresponse to a determination that the query request includes the querycondition that includes the feature dimension i and its correspondingcharacteristic value Mi, the processing engine 1112 may proceed tooperation 304. In other words, the processing engine 1112 may determinewhether the query request is related to a feature dimension i and itscorresponding characteristic value Mi. In response to a determinationthat the query request is related to the feature dimension i and itscharacteristic value, the processing engine 1112 may proceed tooperation 304

Merely by way of example, a query request is to search the IDs of userswho satisfy a query condition “age=80s”. Upon receiving the queryrequest, the processing engine 1112 may determine that the querycondition includes the feature dimension “age” and its characteristicvalue “80s” according to the mapping table of age as shown in Table 6.Then the processing engine 1112 may proceed to operation 304.

In 304, the processing engine 1112 may replace the characteristic valueMi in the query condition of the query request by a null value. In otherwords, the processing engine 112 may update the query request by anempty entry (e.g., the null value). The updated query request mayinclude the feature dimension i and the empty entry. The processingengine 1112 may then perform a search in the cache memory based on theupdated query request.

In the above mentioned example, the original query condition forsearching users' ID may be “age=80s”, which may be updated to a querycondition “age=null”. Therefore, the processing engine 1112 may generatea query result that satisfies the original query condition by performingthe search in the cache memory based on the updated query request. Forexample, the query result may be Y001, Y002, and Y003.

In 305, the processing engine 1112 may replace one or more featurevalues that are null in the cache memory by the correspondingcharacteristic value(s) Mi based on the query request and the recordedcorresponding relationship(s). The processing engine 1112 may generate aquery result based on the replacement result.

Operation 305 may be performed in a similar manner with operation 104,and the descriptions thereof are not repeated here.

Merely by way of example, the query request is to search data (e.g., thegender, the age, and the number of orders in the last 30 days) of allusers whose satisfies query conditions “age=80s” and “gender=male”. Uponreceiving the query request, the processing engine 1112 may determinethat the query request includes a query condition including the featuredimension “age” and the characteristic value “80s” based on the mappingtable of age as shown in Table 6. The processing engine 1112 may replacethe characteristic value “80s” in the query condition of the queryrequest by a null value. Thus, the updated query request may be tosearch data (e.g., the gender, the age, and the number of orders in thelast 30 days) of all users whose satisfies updated query conditions“age=Null” and “gender=male”. The processing engine 1112 may perform thesearch in the cache memory based on the updated query request. The queryresult may be shown in Table 9.

TABLE 9 Query Result The Number of Orders User ID Age Gender in the Last30 days Y003 80s Male 5

As an alternative to the approach above for 305, in some embodiments,the processing engine 1112 may generate the query result directly basedon the search result of the search performed in operation 304. Merely byway of example, the updated query request including the updated querycondition may be a request to search the IDs of users whose gender is“Null”. By searching the user information in the cache memory asillustrated in Table 8 according to the updated query request, theprocessing engine 1112 may generate the search result, that is, “Y001,Y002, and Y004”.

Alternatively, in certain embodiments, when the search results includethe characteristic value of a selected feature dimension and theassociated feature dimensions (e.g. the User ID), the processing engine1112 may further determine whether the search result of the searchincludes an empty return (for the characteristic value). If the searchresult includes an empty return, the processing engine 1112 may generatethe query result based on the search result and the correspondingrelationship(s) between feature dimensions and characteristic values.Merely by way of example, the updated query request including theupdated query condition may be a request to search the number of ordersin the last 30 days of a user whose age is “90s” and gender is “Null”.According to the user information in the cache memory as illustrated inTable 8, the search result from an initial search (e.g. can be referredto a first search) may be “Null”. The processing engine 1112 may furthergenerate the query result by replacing the “Null” of the featuredimension “the number of orders in the last 30 days” by thecorresponding characteristic value (e.g., “0” according to Table 4).

In the processes herein described, if the query request includes a querycondition including a feature dimension i and its correspondingcharacteristic value Mi, the processing engine 1112 may update the queryrequest. The processing engine 1112 may then perform a search in thecache memory based on the updated query request. This may avoid afailure in obtaining the query result according to the original queryrequest.

FIG. 4 is a flowchart illustrating an exemplary process for dataprocessing according to some embodiments of the present disclosure. Insome embodiments, in process 400, in response to a query request, nullfeature value(s) of a feature dimension in the cache memory may bereplaced by corresponding characteristic value Mi based on the recordedcorresponding relationship. A search may then be performed in the cachememory based on the query request. The process 400 disclosed in thepresent disclosure can avoid a failure in obtaining a query resultaccording to the query request.

In some embodiments, process 400 may be an embodiment of the process 100as described in connection with FIG. 1. In some embodiments, one or moreoperations of process 400 may be executed by the data processing system1100 as illustrated in FIG. 11. For example, one or more operations inthe process 400 may be stored in a storage device (e.g., the storagedevice 1140, the ROM 1230, the RAM 1240, the storage 1390) as a set ofinstructions. In some embodiments, the server 1110 (e.g., the processingengine 1112 in the server 1110, the processor 1220 of the processingengine 1112 in the server 1110, the terminal 1130, a data processingdevice (e.g., any one of devices 600 to 900) may execute the set ofinstructions. For illustration purposes, the implementation of theprocess 400 by the processing engine 1112 is described as an example.

In 401, for each feature dimension i of an original data set, theprocessing engine 1112 may select a characteristic value Mi from aplurality of feature values of the feature dimension i. The processingengine 1112 may record a corresponding relationship between each featuredimension i and the corresponding characteristic value Mi

In 402, for each feature dimension i of the original data set, theprocessing engine 1112 may cache feature value(s) of the featuredimension i other than the corresponding characteristic value Mi into acache memory.

Operations 401 and 402 may be performed in a similar manner withoperations 101 and 102, respectively, and the descriptions thereof arenot repeated herein.

In some embodiments, the original data set may be shown in Table 1. Thecorresponding relationships between feature dimensions and thecorresponding characteristic values may be shown in Tables 4, 6, and 7.The cached data may be shown in Table. 8.

In 403, in response to a query request, the processing engine 1112 mayreplace null feature value(s) in the cache memory by correspondingcharacteristic value(s) Mi based on the corresponding relationship(s).In other words, for each feature dimension i, the processing engine 1112may cache the corresponding characteristic value Mi into the cachememory for each entity whose feature dimension i has an empty entry. Theprocessing engine 1112 may then perform a search in the cache memorybased on the query request.

For example, the processing engine 1112 may replace feature values thatare “NULL” in Table 8 by the corresponding characteristic valuesaccording to the corresponding relationships as shown in Tables 4, 6 and7. Merely by way of example, the processing engine 1112 may replace the“Null” feature value(s) of the feature dimension “age” by “80s”. Theprocessing engine 1112 may replace the “Null” feature value(s) of thefeature dimension “gender” by “female”. The processing engine 1112 mayreplace the “Null” feature value(s) the feature dimension “the number oforders in the last 30 days” by “0”. The cached data after thereplacement may be shown in Table 1.

In 404, the processing engine 1112 may generate a query result based ona search result of the search in the cache memory.

Merely by way of example, the query request may include a querycondition to search the total number of orders in the last 30 days ofall users whose age is “80s” and gender is “female”. Upon receiving thequery request and according to the corresponding relationships in Tables4, 6 and 7, the processing engine 1112 may replace the Null featurevalue(s) of the feature dimension “age” by “80s”, the Null featurevalue(s) of the feature dimension “gender” by “female”, and the Nullfeature value(s) of the feature dimension “the number of orders in thelast 30 days” by “0”. The cached data after the replacement may be shownin Table 1. According to Table 1, data of users who satisfy the querycondition of the query request may be shown in Table 10. As shown inTable 10, users “Y001” and “Y002” satisfy the query condition of thequery request. The processing engine 1112 may sum up the numbers oforders in the last 30 days of “Y001” and “Y002” to generate the queryresult. Since the numbers of orders in the last 30 days of “Y001” and“Y002” are both 0, the query result may be 0.

TABLE 10 of User Information The Number of Orders User ID Age Gender inthe Last 30 days Y001 80s Female 0 Y002 80s Female 0

In the processes herein described, upon receiving the query request, theprocessing engine 1112 may replace the null feature value(s) in thecache memory by corresponding characteristic value(s) Mi according tothe recorded corresponding relationship(s). The processing engine 1112may then perform a search in the cache memory based on the queryrequest, which can avoid a failure in obtaining a query result accordingto the query request.

FIG. 5 is a flowchart illustrating an exemplary process for dataprocessing according to some embodiments of the present disclosure.

In some embodiments, process 500 may be an embodiment of the process 100as described in connection with FIG. 1. In some embodiments, one or moreoperations of process 500 may be executed by the data processing system1100 as illustrated in FIG. 11. For example, one or more operations inthe process 500 may be stored in a storage device (e.g., the storagedevice 1140, the ROM 1230, the RAM 1240, the storage 1390) as a set ofinstructions. In some embodiments, the server 1110 (e.g., the processingengine 1112 in the server 1110, the processor 1220 of the processingengine 1112 in the server 1110, the terminal 1130, a data processingdevice (any one of devices 600 to 900) may execute the set ofinstructions. For illustration purposes, the implementation of theprocess 500 by the processing engine 1112 is described as an example.

In 501, for each feature dimension i of an original data set, theprocessing engine 1112 may select a characteristic value Mi from aplurality of feature values of the feature dimension i. The processingengine 1112 may record a corresponding relationship between each featuredimension i and the corresponding characteristic value Mi.

In 502, for each feature dimension i of the original data set, theprocessing engine 1112 may cache feature value(s) of the featuredimension i other than the corresponding characteristic value Mi into acache memory.

Operations 501 and 502 may be performed in a similar manner withoperations 101 and 102, respectively, and the descriptions thereof arenot repeated herein.

In some embodiments, the original data set may be shown in Table 1. Thecorresponding relationships between feature dimensions and thecorresponding characteristic values may be shown in Tables 4, 6, and 7.The cached data may be shown in Table. 8.

In 503, in response to a query request, the processing engine 1112 maydetermine whether the query request includes a query condition thatincludes a feature dimension i based on the recorded correspondingrelationship(s). In response to a determination that the query requestincludes the query condition that includes the feature dimension i, theprocessing engine 1112 may proceed to operation 504. In other words, theprocessing engine 1112 may determine whether the query request isrelated to a feature dimension i. In response to a determination thatthe query request is related to the feature dimension i, the processingengine 1112 may proceed to operation 504.

Merely by way of example, the processing engine 1112 may determinewhether the query condition of the query request includes at least oneof the feature dimensions “gender”, “age”, or “the number of orders inthe last 30 days”. In response to a determination that the querycondition includes at least one of the feature dimensions “gender”,“age”, or “the number of orders in the last 30 days”, the processingengine 1112 may proceed to operation 504.

In 504, the processing engine 1112 may replace null feature value(s)corresponding to the feature dimension i in the cache memory by thecorresponding characteristic value(s) Mi. In other words, the processingengine 1112 may cache the characteristic value(s) Mi of the featuredimension i into the cache memory for each entity whose featuredimension i has an empty entry. The processing engine 1112 may perform asearch in the cache memory based on the query request.

For example, if the query condition of a query request includes thefeature dimension “age”, the processing engine 1112 may replace the“Null” feature value(s) of the feature dimension “age” as shown in Table8 by “80s”. If the query condition of a query request includes thefeature dimension “gender”, the processing engine 1112 may replace the“Null” feature value(s) of the feature dimension “gender” as shown inTable 8 by “female”. If the query condition of a query request includesthe feature dimension “the number of orders in the last 30 days”, theprocessing engine 1112 may replace the “Null” feature value(s) of thefeature dimension “the number of orders in the last 30 days” as shown inTable 8 by “0”.

In 505, the processing engine 1112 may generate a query result based ona search result of the search in the cache memory.

Merely by way of example, the query request may include a querycondition to search the total number of orders in the last 30 days ofall users whose age is “80s” and gender is “female”. Upon receiving thequery request, according to the corresponding relationships in Tables 6and 7, the processing engine 1112 may replace the “Null” featurevalue(s) of the feature dimension “age” as shown in Table 8 by “80s”,and the “Null” feature value(s) of the feature dimension “gender” asshown in Table 8 by “female”. The cached data after the replacement maybe shown in Table 11. Based on Table 11, data of users who satisfy thequery condition of the query request may be shown in Table 12. As shownin Table 12, users “Y001” and “Y002” satisfy the query condition of thequery request. The processing engine 1112 may sum up the numbers oforders in the last 30 days of Y001 and Y002 to generate the queryresult. According to Table 4, since the numbers of orders in the last 30days of “Y001” and “Y002” are both 0, the query result may be 0.

TABLE 11 of User Information The Number of Orders User ID Age Gender inthe Last 30 days Y001 80s Female NULL Y002 80s Female NULL Y003 80s Male5 Y004 90s Female NULL Y005 70s Male NULL

TABLE 12 of User Information The Number of Orders User ID Age Gender inthe Last 30 days Y001 80s Female NULL Y002 80s Female NULL

In the processes herein described, in response to a query request, theprocessing engine 1112 may determine whether the query condition of thequery request includes a feature dimension i based on the recordedcorresponding relationship. In response to a determination that thequery condition includes the feature dimension i, the processing engine1112 may replace null feature value(s) of the feature dimension i in thecache memory by the corresponding characteristic value Mi. Theprocessing engine 1112 may then perform a search in the cache memorybased on the query request. The processes herein described may reducethe amount of data replacement and avoid a failure in obtaining a queryresult since data in the cache memory are replaced according to thequery request.

FIG. 6 is a schematic diagram illustrating an exemplary device for dataprocessing 600 according to some embodiments of the present disclosure.In some embodiments, the data processing device 600 (also referred to asthe device 600) may include a selection module 61, a recording module62, a caching module 63, a search module 64, a replacement module 65,and a generation module 66.

In some embodiments, the modules of the device 600 may be hardwarecircuits of all or part of the processing engine 1112. The modules ofthe device 600 may also be implemented as an application or set ofinstructions read and executed by the processing engine 1112. Further,the modules may be any combination of the hardware circuits and theapplication or set of instructions. For example, the modules of thedevice 600 may be the part of the processing engine 1112 when theprocessing engine 1112 is executing the application or set ofinstructions.

The selection module 61 may be configured to, for each feature dimensioni of an original data set, select a characteristic value Mi from aplurality of feature values of the feature dimension i. In someembodiments, the original data set may include at least one featuredimension.

The original data set may be a data source for data processing, andinclude at least one feature dimension. The original data set mayinclude, for example, user data of an Internet platform. The Internetplatform may be a car hailing application, a website for usertransactions, a website for user communication, etc. The user data mayinclude at least one dimensional of feature. Each dimensional of featuremay be represented as a feature dimension. Each feature dimension mayinclude at least one feature value. The feature value(s) of a featuredimension may be discrete or continuous.

In some embodiments, the original data set may be a data set related toa plurality of users of a car hailing application. The original data setmay include three feature dimensions, such as the age, the gender, andthe number of orders in the last 30 days of the users as shown inTable 1. As shown in Table 1, the feature dimension “age” may includefeature values of “70s” “80s” and “90s”. The feature dimension “gender”may include feature values of “female” and “male”. The feature dimension“the number of orders in the last 30 days” may include feature values of“0” and “5”.

In some embodiments, the characteristic value Mi of the featuredimension i may be selected according to a preset rule. For example, thecharacteristic value Mi may be selected based on a statisticaldistribution of the feature values of the feature dimension i. It shouldbe noted that the characteristic value Mi may be selected by any othermeans. In some embodiments, a feature value of the feature dimension iwith any distribution proportion may be designated as the characteristicvalue Mi. For example, any one of “70s”, “80s”, and “90s” may beselected as the characteristic value Mi of the feature dimension “age”.In order to improve the cache efficiency, a feature value having thelargest distribution proportion may be selected as the characteristicvalue Mi. It should be understood that any feature value can be selectedas the characteristic value Mi to improve the cache efficiency.

The recording module 62 may be configured to record a correspondingrelationship between each feature dimension i and the correspondingcharacteristic value Mi.

The recording module 62 may establish a corresponding relationshipbetween each feature dimension i and the corresponding characteristicvalue Mi to record the corresponding relationship. In some embodiments,the corresponding relationship may be recorded in a mapping table. Forexample, if the selected characteristic value Mi of the featuredimension “age” is “70s”, a mapping table of age may be established asshown in Table 2. If the selected characteristic value Mi of the featuredimension “gender” is “male”, a mapping table of gender may beestablished as shown in Table 3. If the selected characteristic value Miof the feature dimension “the number of orders in the last 30 days” is“0”, a mapping table of the number of orders in the last 30 days may beestablished as shown in Table 4. It should be noted that the mappingtables shown in Tables 2, 3 and 4 may be recorded and/or stored jointlyin one mapping table.

The caching module 63 may be configured to, for each feature dimension iof the original data set, cache feature value(s) of the featuredimension i other than the corresponding characteristic value Mi into acache memory.

For example, the characteristic values Mi of the feature dimensions“age”, “gender”, and “the number of orders in the last 30 days” may be“70s”, “male”, and “0”, respectively. The data cached into the cachememory may be shown as Table 5, in which “NULL” represents that thecorresponding feature value(s) are null in the cache memory. It shouldbe noted that the data cached by the caching module 63 may be determinedaccording to the characteristic values Mi selected by the selectionmodule 61. This is not intended to be limiting, and the cached data maynot be limited to the example illustrated in Table 5 of the presentdisclosure.

The search module 64 may be configured to perform a search in the cachememory in response to a query request.

The query request may include a single query condition or a compoundquery condition. The single query condition may only include one querycondition, while the compound query condition may include at least twoquery conditions. For example, a query request including a single querycondition may be a request to search data (e.g., the age, the gender,and the number of orders in the last 30 days) of a user whose ID is“Y001”, or a request to search data of all users whose age is “80s”. Aquery request including a compound query condition may be a request tosearch data of all users whose age is “80s” or “90s”, or a request tosearch data of all users whose age is “80s” and gender is “female”.

Upon receiving the query request, the search module 64 may perform thesearch in the cache memory according to the query request. In someembodiments, if the returned value(s) of the search based on the cacheddata is not null (that is, the search result of the search does notinclude one or more empty returns), the processing engine 1112 (e.g.,the generation module 66) may generate a query result based on thereturned value(s). Merely by way of example, if the query condition ofthe query request is to search the age of a user whose ID is “Y001”, thecorresponding query result generated based on the returned value(s) maybe “80s”. If the query condition of the query request is to search thegender of a user whose ID is “Y002”, the corresponding query resultgenerated based on the returned value(s) may be “female”. If the querycondition of the query request is to search the number of orders in thelast 30 days of a user whose ID is “Y003”, the corresponding queryresult generated based on the returned value(s) may be “5”. If the querycondition of the query request is to search users whose age is “80s” andgender is “female”, the corresponding query result generated based onthe returned value(s) may be “Y001 and Y002”. In some embodiments, ifone or more returned values of the search based on the cached data arenull (that is, the search result includes one or more empty returns),the replacement module 65 may be activated.

The replacement module 65 may be configured to replace one or morefeature values that are null in the cache memory by the correspondingcharacteristic value(s) Mi based on the query request and the recordedcorresponding relationship.

When a returned value of the search based on the cached data is null (orbeing referred to as “empty”), the replacement module 65 may extract afeature dimension corresponding to the query condition according to thequery request. The replacement module 65 may replace the null value(s)of the extracted feature dimension in the cache memory by thecorresponding characteristic value Mi based on the recordedcorresponding relationship of the extracted feature dimension and itscharacteristic value Mi.

The generation module 66 may be configured to generate the query resultaccording to the replacement result.

For example, if the query condition of the query request is to searchthe gender of a user whose ID is “Y003”, the corresponding feature valuemay be null in the cache memory, which may result in a null returnvalue. The replacement module 65 may extract the feature dimensioncorresponding to the query condition based on the query request, thatis, the feature dimension “gender”. According to the pre-stored mappingtable of gender (e.g., Table 3), the characteristic value of the featuredimension “gender” may be “male”. The replacement module 65 may replacethe null feature value(s) of the feature dimension “gender” in the cachememory by “male”. In that cases, the replacement result of the nullfeature value(s) of the feature dimension “gender” may be “male”. Thegeneration module 66 may then generate the query result based on thereplacement result. For example, the query result may be that “thegender of the user whose ID is “Y003” is male”.

As another example, if the query condition of the query request is tosearch the age and the number of orders in the last 30 days of a userwhose ID is “Y001”, two feature values A and B may be involved. Thefeature value A may be “80s”, which is the age of the user whose ID is“Y001”. The feature value B may be null in the cache memory, which isthe number of orders in the last 30 days of the user whose ID is “Y001”.According to the feature value A, the generation module 66 may generatea query result, e.g., “the age of the user whose ID is “Y001” is 80s”.According to the feature value B, the replacement module 65 may extracta feature dimension corresponding to the query condition based on thequery request, that is, the feature dimension “the number of orders inthe last 30 days”. Based on the pre-stored mapping table of the numberof orders in the last 30 days (e.g., Table 4), the replacement module 65may replace the null feature value(s) of the feature dimension “thenumber of orders in the last 30 days” in the cache memory by “0”. Inthat case, the replacement result of the null feature value(s) of thefeature dimension “the number of orders in the last 30 days” may be “0”.The generation module 66 may generate the query result based on thereplacement result. For example, the query result may be that “thenumber of orders in the last 30 days of the user whose ID is “Y001” is0”.

Merely by way of example, as shown in Table 5, if there are 5 users, thecached user information may be 7 fields less than the original data set.The mapping table of feature dimensions and characteristic values mayneed 6 fields for storing comprehensiveness of information of users.Thus, according to the data processing process disclosed in the presentdisclosure, the cached user information and the mapping table may need 1field in the cache memory less than the original data set. Compared withcaching the original data set, the data processing process disclosed inthe present disclosure may save a large number of fields when there area great number of users (e.g., 200 million) and hundreds of featuredimensions. In this way, the total amount of cached data may becompressed, the cache efficiency may be improved, and the caching costmay be reduced without losing the total amount of information. The savedcache space may be used for caching data of more feature dimensions whenthe cache capacity is limited, which may improve the integrity of data.

In some embodiments, the data processing process described above may beimplemented on an online service system. The online service system maybe configured with a cache cluster and an access server. User featuredata may be cached in a memory of the cache cluster, which can increasethe speed of accessing data. The corresponding relationship of a featuredimension and its characteristic value may be stored locally on theaccess server, or in another storage device of the online servicesystem. The access server may access the corresponding relationship whena feature value is null in the cache memory. In some embodiments, whenthe online service system adopts a machine learning or deep learningtechnique, the online service system may use the data processing processin machine learning or deep learning models, taking advantages ofartificial intelligence.

The technical solutions disclosed in the present disclosure may includethe following beneficial effects.

For each feature dimension i in the original data set, the selectionmodule 61 may select a characteristic value Mi from a plurality offeature values of the feature dimension i. The recording module 62 mayrecord a corresponding relationship between the feature dimension i andMi. The caching module 63 may cache feature values of the featuredimension i other than the characteristic value Mi for each featuredimension i of the original data set into a cache memory. In this case,the amount of cached data may be reduced. The search module 64 mayperform a search in the cache memory in response to a query request. Ifreturned value(s) of the search based on the cached data is not null,the generation module 66 may generate a query result according to thereturned value(s). On the other hand, if the returned value(s) of thesearch based on the cached data includes a null value, the replacementmodule 65 may replace one or more feature values that are null in thecache memory by the corresponding characteristic value(s) Mi based onthe query request and the recorded corresponding relationship. Thegeneration module 66 may generate a query result based on thereplacement result. In this case, the total amount of cached data may becompressed. The cache efficiency may be improved, and the caching costcan be reduced without losing the total amount of information. The savedcache space may be used for caching data of more feature dimensions whenthe cache capacity is limited, which may improve the integrity of data.

FIG. 7 is a schematic diagram illustrating an exemplary device for dataprocessing 700 according to some embodiments of the present disclosure.In some embodiments, the data processing device 700 (also referred to asdevice 700) may include a selection module 61, a recording module 62, acaching module 63, a search module 64, a replacement module 65, and ageneration module 66. The selection module 61 may include adetermination sub-module 611. The determination sub-module 611 may beconfigured to determine a feature value with the largest distributionproportion among a plurality of feature values of a feature dimension ias a characteristic value Mi of the feature dimension i.

In some embodiments, the modules of the device 700 may be hardwarecircuits of all or part of the processing engine 1112. The modules ofthe device 700 may also be implemented as an application or set ofinstructions read and executed by the processing engine 1112. Further,the modules may be any combination of the hardware circuits and theapplication or set of instructions. For example, the modules of thedevice 700 may be the part of the processing engine 1112 when theprocessing engine 1112 is executing the application or set ofinstructions.

In some embodiments, the original data set may be shown in Table 1. Thedetermination sub-module 611 may determine a feature value having thelargest distribution proportion of each feature dimension i as thecharacteristic value Mi of the feature dimension i based on astatistical distribution of the feature values of the feature dimensioni. For example, for the feature dimension “age”, users whose age is“80s” have the largest distribution proportion. Thus, “80s” may bedetermined as the characteristic value Mi of the feature dimension“age”. Similarly, “female” may be determined as the characteristic valueMi of the feature dimension “gender”. “0” may be determined as thecharacteristic value Mi of the feature dimension “the number of ordersin the last 30 days”. A corresponding relationship between the featuredimension “age” and its corresponding characteristic value Mi (alsoreferred to as a mapping table of age) may be established as shown inTable 6. A corresponding relationship between the feature dimension“gender” and its corresponding characteristic value Mi (also referred toas a mapping table of gender) may be established as shown in Table 7. Acorresponding relationship between the feature dimension “the number oforders in the last 30 days” and its corresponding characteristic valueMi (also referred to as a mapping table of the number of orders in thelast 30 days) may be established as shown in Table 4 described in FIG.1.

The caching module 63 may, for each feature dimension i of the originaldata set, cache feature value(s) of the feature dimension i other thanthe corresponding characteristic value Mi into a cache memory. Forexample, the data cached into the cache memory by the caching memory 63may be shown as Table 8. Details regarding the recording module 62, thesearch module 64, the replacement module 65, and the generation module66 may be found elsewhere in the present disclosure (e.g., FIG. 6 andthe relevant descriptions thereof).

Merely by way of example, as shown in Table 8, if there are 5 users, thecached user information may be 10 fields less than the original dataset. The mapping table of feature dimensions and characteristic valuesmay need 6 fields for storing comprehensiveness of information of users.Thus, according to the data processing process disclosed in the presentdisclosure, the cached user information and the mapping table may need 4fields in the cache memory less than the original data set. Comparedwith caching the original data set, the data processing processdisclosed in the present disclosure may save a large number of fieldswhen there are a great number of users (e.g., 200 million) and hundredsof feature dimensions. In this way, the total amount of cached data maybe greatly compressed, the cache efficiency may be greatly improved, andthe caching cost may be greatly reduced without losing the total amountof information. The saved cache space may be used for caching data ofmore feature dimensions when the cache capacity is limited, which mayimprove the integrity of data.

In some embodiments, the feature value having the largest distributionproportion of the feature dimension i may be determined as thecharacteristic value Mi of the feature dimension i. This may greatlycompress the total amount of cached data, improve the cache efficiency,and reduce the caching cost without a loss of the total amount ofinformation. The saved cache space may be used for caching data of morefeature dimensions when the cache capacity is limited, which may improvethe integrity of data.

It should be noted that the original data set may need to be updated dueto actual conditions, such as registrations of new users, massive dataupdating, policy changes of the platform, etc. Therefore, thecorresponding relationship between a feature dimension and itscharacteristic value may need to be updated to ensure the accuracy ofthe query result. The update of the original data set and/or thecorresponding relationship may be caused by various factors, such as butnot limited to a manual input (e.g., an instruction of a user receivedfrom a terminal 1130), a time condition (e.g., a requirement ofperiodical update or real-time update), a certain event (e.g., a massivedata updating, registrations of new users), etc.

In some embodiments, as shown in FIG. 7, the data processing device 700may further include a first updating module 71, a second updating module72, a third updating module 73, and a fourth updating module 74.

The first updating module 71 may be configured to update the originaldata set. The updated original data set may be regarded as a new datasource on which the data processing device disclosed in the presentdisclosure in implemented. In some embodiments, the updated originaldata set may include feature information of one or more new entities.Additionally or alternatively, the updated original data set may includeupdated feature information of the original entities, such as but notlimited to updated feature values of the original feature dimension(s),feature information of one or more new feature dimensions.

The second updating module 72 may be configured to update a distributionproportion of each feature value of the feature dimension i based on theupdated original data set.

Merely by way of example, “female” is a feature value that has thelargest distribution proportion in the feature dimension “gender”. Afterthe original data set is updated, “male” becomes a feature value thathas the largest distribution proportion in the feature dimension“gender”.

The third updating module 73 may be configured to update thecharacteristic value Mi based on the updated distribution proportion(s)of feature value(s) in the feature dimension i.

In the above mentioned example, the characteristic value Mi of thefeature dimension “gender” may change from “female” to “male”.

The fourth updating module 74 may be configured to update thecorresponding relationship between i and Mi based on the updatedcharacteristic value Mi.

In the above mentioned example, the corresponding relationship betweenthe feature dimension “gender” and its characteristic value Mi may beupdated. For example, the mapping table of gender as shown in Table 7may be updated to the mapping table of gender as shown in Table 3.

In some embodiments, after the first updating module 71, the secondupdating module 72, the third updating module 73, and the fourthupdating module 74 is executed, the caching module 63 may be activated.The caching module 63 may, for each feature dimension i in the updatedoriginal data set, cache feature value(s) other than the correspondingupdated characteristic value Mi into the cache memory.

In the present disclosure, the feature value having the largestdistribution proportion of a feature dimension i may be determined asthe characteristic value Mi of the feature dimension i, which maygreatly compress the total amount of cached data, improve the cacheefficiency, and reduce the caching cost without losing the total amountof information. Besides, the cached data and the correspondingrelationship(s) between feature dimension(s) and the correspondingcharacteristic value(s) may be updated based on the updated originaldata set, which may ensure the accuracy of the query result.

FIG. 8 is a schematic diagram illustrating an exemplary device for dataprocessing 800 according to some embodiments of the present disclosure.In some embodiments, the data processing device 800 (also referred to asthe device 800) may include a selection module 61, a recording module62, a caching module 63, a search module 64, a replacement module 65,and a generation module 66.

The modules of the device 800 may be hardware circuits of all or part ofthe processing engine 1112. The modules of the device 800 may also beimplemented as an application or set of instructions read and executedby the processing engine 1112. Further, the modules may be anycombination of the hardware circuits and the application or set ofinstructions. For example, the modules of the device 800 may be the partof the processing engine 1112 when the processing engine 1112 isexecuting the application or set of instructions.

The selection module 61 may be configured to, for each feature dimensioni of an original data set, select a characteristic value Mi from aplurality of feature values of the feature dimension i.

The recording module 62 may be configured to record a correspondingrelationship between each feature dimension i and the correspondingcharacteristic feature Mi.

The caching module 63 may be configured to, for each feature dimension Iof the original data set, cache feature value(s) of the featuredimension i other than the corresponding characteristic value Mi into acache memory.

Details regarding the selection module 61, the recording module 62, andthe caching module 63 may be found elsewhere in the present disclosure(e.g., FIG. 6 and the relevant descriptions thereof).

In some embodiments, the original data set may be shown in Table 1. Thecorresponding relationships between feature dimensions and thecorresponding characteristic values may be shown in Tables 4, 6, and 7.The cached data may be shown in Table. 8.

The search module 64 may be configured to perform a search in the cachememory in response to a query request.

The search module 64 may include a determination sub-module 641, a firstreplacement sub-module 642, and a first search sub-module 643.

The determination sub-module 641 may be configured to, in response to aquery request, determine whether the query request includes a querycondition that includes a feature dimension i and its correspondingcharacteristic value Mi based on the recorded correspondingrelationship(s). In response to a determination that the query requestincludes the query condition that includes the feature dimension i andits corresponding characteristic value Mi, the first replacementsub-module 642 may be activated.

Merely by way of example, a query request is to search the IDs of userswho satisfy a query condition “age=80s”. Upon receiving the queryrequest, the determination sub-module 641 may determine that the querycondition includes the feature dimension “age” and its characteristicvalue “80s” according to the mapping table of age as shown in Table 6.Then the first replacement sub-module 642 may be activated.

The first replacement sub-module 642 may be configured to replace thecharacteristic value Mi in the query condition of the query request by anull value. In other words, the first replacement sub-module 642 mayupdate the query request by an empty entry (e.g., the null value). Theupdated query request may include the feature dimension i and the emptyentry.

The first search sub-module 643 may be configured to perform a search inthe cache memory based on the updated query request

In the above mentioned example, the original query condition forsearching users' ID may be “age=80s”, which may be updated to a querycondition “age=null”. Therefore, the processing engine 1112 (e.g., thegeneration module 66) may generate a query result that satisfies theoriginal query condition by performing the search in the cache memorybased on the updated query request. For example, the query result may beY001, Y002, and Y003.

The replacement module 65 may be configured to replace one or morefeature values that are null in the cache memory by correspondingcharacteristic value(s) Mi based on the query request and the recordedcorresponding relationship(s).

The generation module 66 may be configured to generate a query resultbased on the replacement result.

Details regarding the replacement module 65 and the generation module 66may be found elsewhere in the present disclosure (e.g., FIG. 6 and therelevant descriptions thereof).

Merely by way of example, the query request is to search data (e.g., thegender, the age, and the number of orders in the last 30 days) of allusers whose satisfies query conditions “age=80s” and “gender=male”. Uponreceiving the query request, the determination sub-module 641 maydetermine that the query request includes a query condition includingthe feature dimension “age” and the characteristic value “80s” based onthe mapping table of age as shown in Table 6. The first replacementsub-module 642 may replace the characteristic value “80s” in the querycondition of the query request by a null value. Thus, the updated queryrequest may be to search data (e.g., the gender, the age, and the numberof orders in the last 30 days) of all users whose satisfies updatedquery conditions “age=Null” and “gender=male”. The first searchsub-module 643 may perform the search in the cache memory based on theupdated query request. The query result may be shown in Table 9.

In the present disclosure, if the query request includes a querycondition including a feature dimension i and its correspondingcharacteristic value Mi, the processing engine 1112 (e.g., the firstreplacement sub-module 642) may update the query request. The processingengine 1112 (e.g., the first search sub-module 643) may then perform asearch in the cache memory based on the updated query request. This mayavoid a failure in obtaining the query result according to the originalquery request.

FIG. 9 is a schematic diagram illustrating an exemplary device for dataprocessing 900 according to some embodiments of the present disclosure.In some embodiments, the data processing device 900 (also referred to asthe device 900) may include a selection module 61, a recording module62, a caching module 63, a search module 64, a replacement module 65,and a generation module 66.

The modules of the device 900 may be hardware circuits of all or part ofthe processing engine 1112. The modules of the device 900 may also beimplemented as an application or set of instructions read and executedby the processing engine 1112. Further, the modules may be anycombination of the hardware circuits and the application or set ofinstructions. For example, the modules of the device 900 may be the partof the processing engine 1112 when the processing engine 1112 isexecuting the application or set of instructions.

The selection module 61 may be configured to, for each feature dimensioni of an original data set, select a characteristic value Mi from aplurality of feature values of the feature dimension i.

The recording module 62 may be configured to record a correspondingrelationship between each feature dimension i and the correspondingcharacteristic feature Mi.

The caching module may be configured to, for each feature dimension i ofthe original data set, cache feature value(s) of the feature dimension iother than the corresponding characteristic value Mi into a cachememory.

Details regarding the selection module 61, the recording module 62, andthe caching module 63 may be found elsewhere in the present disclosure(e.g., FIG. 6 and the relevant descriptions thereof).

In some embodiments, the original data set may be shown in Table 1. Thecorresponding relationships between feature dimensions and thecorresponding characteristic values may be shown in Tables 4, 6, and 7.The cached data may be shown in Table. 8.

The search module 64 may be configured to perform a search in the cachememory in response to a query request.

The search module 64 may include a second replacement sub-module 644,and a second search sub-module 645.

The second replacement sub-module 644 may be configured to replace nullfeature value(s) in the cache memory by corresponding characteristicvalue(s) Mi based on the corresponding relationship(s).

For example, the second replacement sub-module 644 may replace featurevalues that are “NULL” in Table 8 by the corresponding characteristicvalues according to the corresponding relationships as shown in Tables4, 6 and 7. Merely by way of example, the second replacement sub-module644 may replace the “Null” feature value(s) of the feature dimension“age” by “80s”. The second replacement sub-module 644 may replace the“Null” feature value(s) of the feature dimension “gender” by “female”.The second replacement sub-module 644 may replace the “Null” featurevalue(s) the feature dimension “the number of orders in the last 30days” by “0”. The cached data after the replacement may be shown inTable 1.

The second search sub-module 645 may be configured to perform a searchin the cache memory based on the query request.

The replacement module 65 may be configured to replace one or morefeature values that are null in the cache memory by correspondingcharacteristic value(s) Mi based on the query request and the recordedcorresponding relationship(s).

The generation module 66 may be configured to generate a query resultbased on the replacement result.

Details regarding the replacement module 65 and the generation module 66may be found elsewhere in the present disclosure (e.g., FIG. 6 and therelevant descriptions thereof).

Merely by way of example, the query request may include a querycondition to search the total number of orders in the last 30 days ofall users whose age is “80s” and gender is “female”. Upon receiving thequery request and according to the corresponding relationships in Tables4, 6 and 7, the second replacement sub-module 644 may replace the Nullfeature value(s) of the feature dimension “age” by “80s”, the Nullfeature value(s) of the feature dimension “gender” by “female”, and theNull feature value(s) of the feature dimension “the number of orders inthe last 30 days” by “0”. The cached data after the replacement may beshown in Table 1. According to Table 1, data of users who satisfy thequery condition of the query request may be shown in Table 10. As shownin Table 10, users “Y001” and “Y002” satisfy the query condition of thequery request. The processing engine 1112 (e.g., the generation module66) may sum up the numbers of orders in the last 30 days of “Y001” and“Y002” to generate the query result. Since the numbers of orders in thelast 30 days of “Y001” and “Y002” are both 0, the query result may be 0.

In the present disclosure, upon receiving the query request, theprocessing engine 1112 (e.g., the second replacement sub-module 644) mayreplace the null feature value(s) in the cache memory by correspondingcharacteristic value(s) Mi according to the recorded correspondingrelationship(s). The processing engine 1112 (e.g., the second searchsub-module 645) may then perform a search in the cache memory based onthe query request, which can avoid a failure in obtaining a query resultaccording to the query request.

In some embodiments, the second replacement sub-module 644 of the device900 may be further configured to determine whether the query requestincludes a query condition that includes a feature dimension i based onthe recorded corresponding relationship(s). In response to adetermination that the query request includes the query condition thatincludes the feature dimension i, the second replacement sub-module 644may replace null feature value(s) corresponding to the feature dimensioni in the cache memory by the corresponding characteristic value(s) Mi.

In some embodiments, the original data set may be shown in Table 1. Thecorresponding relationships between feature dimensions and thecorresponding characteristic values may be shown in Tables 4, 6, and 7.The cached data may be shown in Table. 8. The second replacementsub-module 644 may determine whether the query condition of the queryrequest includes at least one of the feature dimensions “gender”, “age”,or “the number of orders in the last 30 days”. If the query condition ofa query request includes the feature dimension “age”, the secondreplacement sub-module 644 may replace the “Null” feature value(s) ofthe feature dimension “age” as shown in Table 8 by “80s”. If the querycondition of a query request includes the feature dimension “gender”,the second replacement sub-module 644 may replace the “Null” featurevalue(s) of the feature dimension “gender” as shown in Table 8 by“female”. If the query condition of a query request includes the featuredimension “the number of orders in the last 30 days”, the secondreplacement sub-module 644 may replace the “Null” feature value(s) ofthe feature dimension “the number of orders in the last 30 days” asshown in Table 8 by “0”.

Merely by way of example, the query request may include a querycondition to search the total number of orders in the last 30 days ofall users whose age is “80s” and gender is “female”. Upon receiving thequery request, according to the corresponding relationships in Tables 6and 7, the second replacement module 644 may replace the “Null” featurevalue(s) of the feature dimension “age” as shown in Table 8 by “80s”,and the “Null” feature value(s) of the feature dimension “gender” asshown in Table 8 by “female”. The cached data after the replacement maybe shown in Table 11. Based on Table 11, data of users who satisfy thequery condition of the query request may be shown in Table 12. As shownin Table 12, users “Y001” and “Y002” satisfy the query condition of thequery request. The generation module 66 may sum up the numbers of ordersin the last 30 days of Y001 and Y002 to generate the query result.According to Table 4, since the numbers of orders in the last 30 days of“Y001” and “Y002” are both 0, the query result may be 0.

In the present disclosure, in response to a query request, the secondreplacement sub-module 644 may determine whether the query condition ofthe query request includes a feature dimension i based on the recordedcorresponding relationship. In response to a determination that thequery condition includes the feature dimension i, the second replacementsub-module 644 may replace null feature value(s) of the featuredimension i in the cache memory by the corresponding characteristicvalue Mi. The second search sub-module 645 may then perform a search inthe cache memory based on the query request. The devices hereindescribed may reduce the amount of data replacement and avoid a failurein obtaining a query result since data in the cache memory are replacedaccording to the query request.

In the present disclosure, the embodiments of device(s) and process(s)may complement and reinforce each other without conflict. Theembodiments of device(s) are provided merely for illustration purposes.The units (or modules) described as separate components may or may notbe physically separated. The components shown as units (or modules) mayor may not be physical units (or modules). Thus, these components may belocated in one place or distributed on a plurality of network units.Some or all modules may be selected to implement the present disclosureaccording to actual needs. Those skilled in the art can understand andimplement the embodiments without creative efforts.

The present disclosure also provides a computer storage medium. Thecomputer storage medium may store computer programs. When executed by aprocessor, the computer programs may cause a computer device to performthe following operations.

For each feature dimension i of an original data set, the computerprograms may cause the computer device to select a characteristic valueMi from a plurality of feature values of the feature dimension i. Thecomputer programs may cause the computer device to record acorresponding relationship between each feature dimension i and itscharacteristic value Mi. The original data set may include at least onefeature dimension.

For each feature dimension i of the original data set, the computerprograms may cause the computer device to cache feature value(s) of thefeature dimension i other than the corresponding characteristic value Miinto a cache memory.

The computer programs may cause the computer device to perform a searchin the cache memory in response to a query request.

The computer programs may cause the computer device to replace one ormore feature values that are null in the cache memory by thecorresponding characteristic value(s) Mi based on the query request andthe recorded corresponding relationship(s). The computer programs mayfurther cause the computer device to generate a query result based onthe replacement result.

The processes and system herein described may be used to improve thefunctions of computers (or referred to as servers) in a wide variety ofapplication scenarios. In essence, by reducing the data and/or valuesthat need to be cached, the processes and system in the currentdisclosure improve the efficiency of using the cache memory, especiallywhen dealing with large data sets. The computers may improve their speedin processing data and enable faster return for searches and inquiries.

FIG. 10 is a schematic diagram of an exemplary electronic device 1000according to some embodiments of the present disclosure. As shown inFIG. 10, at a hardware level, the electronic device 1000 may include aprocessor 1010, an internal bus 1020, a network port 1030, a memory1040, a non-volatile memory 1050, and other hardware components (notshown in FIG. 10). The processor 1010 may read computer programs fromthe non-volatile memory 1050 into the memory 1040, and run the computerprograms in the memory 1040, forming a data processing device at alogical level. It should be noted that the present disclosure does notexclude other implementations (e.g., a logic element implementation, acombination implementation of hardware and software) other than thesoftware implementation. That is, an execution subject of the followingprocess may not be limited to logic units, and may also be hardware orlogic elements.

The processor 1010 may be configured to, for each feature dimension i ofan original data set, select a characteristic value Mi from a pluralityof feature values of the feature dimension i. The processor 1010 mayalso record a corresponding relationship between each feature dimensioni and its characteristic value Mi. The original data set may include atleast one feature dimension.

The processor 1010 may also be configured to, for each feature dimensioni of the original data set, cache feature value(s) of the featuredimension i other than the corresponding characteristic value Mi into acache memory.

The processor 1010 may also be configured to perform a search in thecache memory in response to a query request.

The processor 1010 may further be configured to replace one or morefeature values that are null in the cache memory by the correspondingcharacteristic value(s) Mi based on the query request and the recordedcorresponding relationship(s). The processor 1010 may then generate aquery result based on the replacement result.

FIG. 11 is a block diagram illustrating an exemplary data processingsystem 1100 according to some embodiments of the present disclosure. Insome embodiments, the data processing system 1100 may be a platform inwhich information related to entities in the platform is stored and/orprocessed. In some embodiments, the platform may be an online platformproviding an online service, such as an entertainment service, a searchservice, a communication service, an e-commerce service, or the like, orany combination thereof. In certain embodiments, the data processingsystem 1100 may be an online platform providing an Online to Offline(020) service, such as but not limited to a transportation service(e.g., a taxi-hailing service, a chauffeur service, an express carservice, a carpool service, a bus service, a driver hire service, and ashuttle service), a meal booking service, an online shopping service, orthe like. The data processing system 1100 may include a server 1110, anetwork 1120, a terminal 1130, a storage device 1140, and a cache memory1150.

In some embodiments, the server 1110 may be a single server or a servergroup. The server group may be centralized, or distributed (e.g., theserver 1110 may be a distributed system). In some embodiments, theserver 1110 may be local or remote. For example, the server 1110 mayaccess information and/or data stored in the terminal 1130, the storagedevice 1140, and/or the cache memory 1150 via the network 120. Asanother example, the server 1110 may be directly connected to theterminal 1130, the storage device 1140 and/or the cache memory 1150 toaccess stored information and/or data. In some embodiments, the server1110 may be implemented on a cloud platform. Merely by way of example,the cloud platform may include a private cloud, a public cloud, a hybridcloud, a community cloud, a distributed cloud, an inter-cloud, amulti-cloud, or the like, or any combination thereof. In someembodiments, the server 1110 may be implemented on a computing device1200 having one or more components illustrated in FIG. 12 in the presentdisclosure.

In some embodiments, the server 1110 may include a processing engine1112. In some embodiments, the processing engine 11112 may include oneor more processing engines (e.g., single-core processing engine(s) ormulti-core processor(s)). Merely by way of example, the processingengine 1112 may include a central processing unit (CPU), anapplication-specific integrated circuit (ASIC), an application-specificinstruction-set processor (ASIP), a graphics processing unit (GPU), aphysics processing unit (PPU), a digital signal processor (DSP), afield-programmable gate array (FPGA), a programmable logic device (PLD),a controller, a microcontroller unit, a reduced instruction-set computer(RISC), a microprocessor, or the like, or any combination thereof. Insome embodiments, at least part of the server 1110 (e.g., the processingengine 112) may be integrated into the terminal 1130.

The network 1120 may facilitate exchange of information and/or data. Insome embodiments, one or more components of the data processing system1100 (e.g., the server 1110, the terminal 1130, the storage device 1140,and the cache memory 1150) may transmit information and/or data to othercomponent(s) of the data processing system 1100 via the network 1120.For example, the server 1110 may receive a request from the terminal1130 via the network 1120. In some embodiments, the network 1120 may beany type of wired or wireless network, or combination thereof. Merely byway of example, the network 1120 may include a cable network, a wirelinenetwork, an optical fiber network, a telecommunications network, anintranet, an Internet, a local area network (LAN), a wide area network(WAN), a wireless local area network (WLAN), a metropolitan area network(MAN), a wide area network (WAN), a public telephone switched network(PSTN), a Bluetooth network, a ZigBee network, a near fieldcommunication (NFC) network, or the like, or any combination thereof. Insome embodiments, the network 1120 may include one or more networkaccess points. For example, the network 1120 may include wired orwireless network access points such as base stations and/or internetexchange points 1120-1, 1120-2, through which one or more components ofthe data processing system 1100 may be connected to the network 1120 toexchange data and/or information.

In some embodiments, the terminal 1130 may include a mobile device1130-1, a tablet computer 1130-2, a laptop computer 1130-3, a built-indevice in a vehicle 1130-4, or the like, or any combination thereof. Insome embodiments, the mobile device 1130-1 may include a smart homedevice, a wearable device, a smart mobile device, a virtual realitydevice, an augmented reality device, or the like, or any combinationthereof. In some embodiments, the smart home device may include a smartlighting device, a control device of an intelligent electricalapparatus, a smart monitoring device, a smart television, a smart videocamera, an interphone, or the like, or any combination thereof. In someembodiments, the wearable device may include a smart bracelet, a smartfootgear, smart glasses, a smart helmet, a smart watch, smart clothing,a smart backpack, a smart accessory, or the like, or any combinationthereof. In some embodiments, the smart mobile device may include asmartphone, a personal digital assistance (PDA), a gaming device, anavigation device, a point of sale (POS) device, or the like, or anycombination thereof. In some embodiments, the virtual reality deviceand/or the augmented reality device may include a virtual realityhelmet, virtual reality glasses, a virtual reality patch, an augmentedreality helmet, augmented reality glasses, an augmented reality patch,or the like, or any combination thereof. For example, the virtualreality device and/or the augmented reality device may include Google™Glasses, an Oculus Rift, a HoloLens, a Gear VR, etc. In someembodiments, the built-in device in the vehicle 1130-4 may include anonboard computer, an onboard television, etc. In some embodiments, theterminal 1130 may communicate with the server 1110 via a wirelessconnection. For example, the terminal 1130 may receive informationand/or instructions inputted by a user, and send the receivedinformation and/or instructions to the server 1110 via the network 1120.

The storage device 1140 may store data and/or instructions. In someembodiments, the storage device 1140 may store data obtained from theterminal 1130. In some embodiments, the storage device 1140 may storedata and/or instructions that the server 1110 may execute or use toperform exemplary methods described in the present disclosure. Merely byway of example, the storage device 1140 may store a set of instructionsrelated to data querying. As another example, the storage device 1140may store feature information of a plurality of entities of the dataprocessing system 1100. In some embodiments, the storage device 1140 mayinclude a mass storage, removable storage, a volatile read-and-writememory, a read-only memory (ROM), or the like, or any combinationthereof. Exemplary mass storage may include a magnetic disk, an opticaldisk, solid-state drives, etc. Exemplary removable storage may include aflash drive, a floppy disk, an optical disk, a memory card, a zip disk,a magnetic tape, etc. Exemplary volatile read-and-write memory mayinclude a random-access memory (RAM). Exemplary RAM may include adynamic RAM (DRAM), a double date rate synchronous dynamic RAM (DDRSDRAM), a static RAM (SRAM), a thyristor RAM (T-RAM), and azero-capacitor RAM (Z-RAM), etc. Exemplary ROM may include a mask ROM(MROM), a programmable ROM (PROM), an erasable programmable ROM (EPROM),an electrically-erasable programmable ROM (EEPROM), a compact disk ROM(CD-ROM), and a digital versatile disk ROM, etc. In some embodiments,the storage device 1140 may be implemented on a cloud platform. Merelyby way of example, the cloud platform may include a private cloud, apublic cloud, a hybrid cloud, a community cloud, a distributed cloud, aninter-cloud, a multi-cloud, or the like, or any combination thereof.

In some embodiments, the storage device 1140 may be connected to thenetwork 1120 to communicate with one or more components of the dataprocessing system 1100 (e.g., the server 1110, the terminal 1130, or thecache memory 1150). One or more components of the data processing system1100 may access the data or instructions stored in the storage device1140 via the network 1120. In some embodiments, the storage device 1140may be directly connected to or communicate with one or more componentsof the data processing system 1100 (e.g., the server 1110, the requesterterminal 1130, the cache memory 1150). In some embodiments, the storagedevice 1140 may be part of the server 1110 or the terminal 1130.

The cache memory 1150 may store data and/or instructions. The cachememory 1150 may be a random access memory (RAM) that a computerprocessor (e.g., the processing engine 1112) can access more quicklythan it accesses the storage device 1140. In some embodiments, the cachememory 1150 may store copies of instructions and/or data that are usedfrequently by the computer processor (e.g., the processing engine 1112),such that the computer processor may access the data more efficientlyfrom the cache memory 1150 rather than the storage device 1140.

In some embodiments, the cache memory 1150 may be connected to thenetwork 1120 to communicate with one or more components of the dataprocessing system 1100 (e.g., the server 1110, the terminal 1130, or thestorage device 1140). One or more components of the data processingsystem 1100 may access the data or instructions stored in the cachememory 1150 via the network 1120. In some embodiments, the cache memory1150 may be directly connected to or communicate with one or morecomponents of the data processing system 1100 (e.g., the server 1110,the requester terminal 1130, the storage device 1140). In someembodiments, the cache memory 1150 may be part of the server 1110. Forexample, the cache memory 1150 may be directly integrated in theprocessing engine 1112. As another example, the cache memory 1150 may beplaced on a separate chip in the server 1110 that is connected to theprocessing engine 1112. In some embodiments, the cache memory 1150 maybe integrated into the terminal 1130. For example, the cache memory 1150may be integrated into a processor of the terminal 1130, or beintegrated into the terminal 1130 and communicated with the processor ofthe terminal 1130.

One of ordinary skill in the art would understand that when an element(or component) of the data processing system 1100 performs, the elementmay perform through electrical signals and/or electromagnetic signals.For example, when the terminal 1130 transmits out a request to theserver 1110, a processor of the terminal 1130 may generate an electricalsignal encoding the request. The processor of the terminal 1130 may thentransmit the electrical signal to an output port. If the terminal 1130communicates with the server 1110 via a wired network, the output portmay be physically connected to a cable, which further may transmit theelectrical signal to an input port of the server 1110. If the terminal1130 communicates with the server 1110 via a wireless network, theoutput port of the terminal 1130 may be one or more antennas, whichconvert the electrical signal to electromagnetic signal. Within anelectronic device, such as the requester terminal 1130, and/or theserver 1110, when a processor thereof processes an instruction,transmits out an instruction, and/or performs an action, the instructionand/or action is conducted via electrical signals. For example, when theprocessor retrieves or saves data from a storage medium, it may transmitout electrical signals to a read/write device of the storage medium,which may read or write structured data in the storage medium. Thestructured data may be transmitted to the processor in the form ofelectrical signals via a bus of the electronic device. Here, anelectrical signal may refer to one electrical signal, a series ofelectrical signals, and/or a plurality of discrete electrical signals.

In some embodiments, the storage device 1140 may store a set ofinstructions for querying data and feature information of a plurality ofentities of the data processing system 1100. In some embodiments, anentity may refer to something having a real existence, as a subject oras an object, currently or potentially, concretely or abstractly,physically or virtually. The feature information may include at leastone feature dimension for each entity, and at least one feature valuefor each feature dimension. In some embodiments, the plurality ofentities may include at least one of service requesters, serviceproviders, or service orders in an O2O service system. Merely by way ofexample, the plurality of entities may include a plurality of servicerequesters of a car-hailing service system. The feature information mayinclude feature values of a plurality of feature dimensions (e.g., thegender, the age, the phone number, the address, the number of historicalservice orders) of each service requester. In some embodiments, thefeature information stored in the storage device 1140 may also bereferred to as an original data set for analysis.

A processor (e.g., the processing engine 1112, a processor of theterminal 1130) may be in communication of the storage device 1140 andexecute the set of instructions. When executing the set of instructions,the processor may cause the data processing system 1100 or a componentthereof to perform a data querying process. In some embodiments, theprocessing engine 1112 may be configured to determine a characteristicvalue of a selected feature dimension among feature values of theselected feature dimension of the plurality of entities. The processingengine 1112 may also establish a corresponding relationship between thecharacteristic value and the selected feature dimension. The selectedfeature dimension may be any one of the at least one feature dimensionof the plurality of entities. In some embodiments, the processing engine1112 may select the characteristic value of the selected featuredimension from the feature values of the selected feature dimension ofthe entities randomly or according to a selection rule. In certainembodiments, the processing engine 1112 may determine a mode of thefeature values of the selected feature dimension of the entities as thecharacteristic value of the selected feature dimension.

After the characteristic value of the selected feature dimension isdetermined, the processing engine 1112 may cache the feature value ofthe selected feature dimension into the cache memory 1150 for eachentity whose feature value of the selected feature dimension is unequalto the characteristic value. On the other hand, for each entity having afeature value of the selected feature dimension being equal to thecharacteristic value, the processing engine 1112 may leave thecorresponding feature value of the selected feature dimension withoutcaching. If a feature value of a feature dimension of an entity is notcached into the cache memory 1150, the corresponding feature value maybe recorded as an empty entry (e.g., a Null value) in the cache memory1150. Merely by way of example, if the characteristic value of thefeature dimension “age” is 30, the processing engine 1112 may cache thefeature value of “age” of each entity whose age is not 30 into the cachememory 1150, while leave the feature value of “age” for each entitywhose age is 30 without caching. The age of an entity whose age is notcached into the cache memory 1150 may be recorded as “Null” in the cachememory 1150.

The processing engine 1112 may receive a query request related to theplurality of entities. In some embodiments, the query request mayinclude one or more query conditions used to narrow query request toentities who fulfill the query condition(s). For example, the queryrequest may be a request to search feature information of entities whofulfill a query condition “age=30”. In some embodiments, the queryrequest may be inputted by a user via the terminal 1130.

In some embodiments, in response to the query request, the processingengine 1112 may perform a first search in the cache memory to produce afirst search result. For example, the processing engine 1112 may producethe first search result by searching information of entities who fulfillthe query condition(s) of the query request. In some embodiments, theprocessing engine 1112 may determine whether the first search resultincludes an empty return. In response to a determination that the firstsearch result does not include an empty return, the processing engine1112 may directly generate a query result based on the first searchresult.

On the other hand, in response to a determination that the first searchresult includes an empty return, the processing engine 1112 may generatethe query result based on the first search result and the correspondingrelationship between the selected feature dimension and thecharacteristic value. In some embodiments, the processing engine 1112may replace one or more empty returns related to the selected featuredimension in the first search result with the characteristic value ofthe selected feature dimension. The processing engine 1112 may furthergenerate the query result based on the replaced first search result. Insome embodiments, in response to a determination that the first searchresult includes an empty return, the processing engine 1112 may cachethe characteristic value of the selected feature dimension into thecache memory 1150 based on the corresponding relationship for eachentity whose selected feature dimension having an empty entry. Theprocessing engine 1112 may perform a second search in the cache memoryin response to the query request to produce a second search result, andgenerate the query result based on the second search result. Forillustration purposes, an example of generating a query result inresponse to a query request of searching the age of a user whose ID is“001” is described. Assuming that the age of the user “001” is equal tothe characteristic value of the feature dimension “age”, the firstsearch result may indicate that the age of user “001” is “Null”. In someembodiments, the processing engine 1112 may replace the “Null” of thefeature dimension “age” in the first search result by the correspondingcharacteristic value. The processing engine 1112 may then generate thequery result based on replaced first search result. Alternatively, theprocessing engine 1112 may cache the characteristic value of “age” intothe cache memory 1150 for each user whose age is equal to thecharacteristic value. The processing engine 1112 may then perform asecond search in the cache memory 1150, find the age of the user “001”,and generate the query result accordingly.

In some embodiments, upon receiving the query request and beforeperforming the first search in the cache memory 1150, the processingengine 1112 may cache the characteristic value of the selected featuredimension into the cache memory for each entity whose selected featuredimension has an empty entry. The processing engine 1112 may thenperform the first search in the cache memory after the characteristicvalue of the selected feature dimension is cached. This can avoid anundesirable first search result that includes an empty return.

In some embodiments, the processing engine 1112 may further analyze thequery request before performing the first search. For example, theprocessing engine 1112 may determine whether the query request isrelated to the selected feature dimension and the correspondingcharacteristic value. In response to a determination that the queryrequest is related to the selected feature dimension and thecorresponding characteristic value, the processing engine 1112 mayupdate the query request. The updated query request may include thefeature dimension and an empty entry. The processing engine 1112 maythen perform the first search in the cache memory based on the updatedquery request. For example, if the characteristic value of “age” is 30and the query request is to search the ID of users whose age is 30, theprocessing engine 1112 may determine that the query request is relatedto the feature dimension “age” and its characteristic value, so that theprocessing engine 1112 may update the query request to search the ID ofusers whose age is “Null”. In this way, the processing engine 1112 mayfind the users whose age is equal to “Null”, that is, the characteristicvalue of age in the cache memory 1150. In some embodiments, theprocessing engine 1112 may determine whether the query request isrelated to the selected feature dimension. In response to adetermination that the query request is related to the selected featuredimension, the processing engine 1112 may cache the characteristic valueof the selected feature dimension into the cache memory for each entitywhose selected feature has an empty entry. The processing engine 1112may then perform the first search in the cache memory in response to thequery request. Details regarding the generation of the query resultbased on the analysis of the query request may be found elsewhere in thepresent disclosure (e.g., FIGS. 3 and 5 and the relevant descriptionsthereof).

In some embodiments, the feature information stored in the storagedevice 1140 may be updated according to different situations. Forexample, the feature information may be updated periodically. As anotherexample, the feature information may be updated under an instruction ofa user or when a number of new entities appear in the data processingsystem 1100. In some embodiments, the processing engine 1112 may updatethe feature information of the plurality of entities in the storagedevice 1140, and determine an updated characteristic value of theselected feature dimension based on the updated feature information. Theprocessing engine 1112 may also perform the data processing processdisclosed in the present disclosure based on the updated characteristicvalue. Details regarding the update of the feature information may befound elsewhere in the present disclosure (e.g., FIG. 2 and the relevantdescriptions thereof).

FIG. 12 illustrates a schematic diagram of an exemplary computing device1200 according to some embodiments of the present disclosure. Thecomputing device 1200 may be a computer, such as the server 1110 in FIG.11 and/or a computer with specific functions, configured to implementany particular system according to some embodiments of the presentdisclosure. The computing device 1200 may be configured to implement anycomponents that perform one or more functions disclosed in the presentdisclosure. For example, the server 1110 may be implemented in hardwaredevices, software programs, firmware, or any combination thereof of acomputer like computing device 1200. For brevity, FIG. 12 depicts onlyone computing device. In some embodiments, the functions of thecomputing device, providing function that recommending pick-up locationsmay require, may be implemented by a group of similar platforms in adistributed mode to disperse the processing load of the system.

The computing device 1200 may include a communication port 1250 that mayconnect with a network that may implement the data communication. Thecomputing device 1200 may also include a processor 1220 that isconfigured to execute instructions and includes one or more processors.The schematic computer platform may include an internal communicationbus 1210, different types of program storage units and data storageunits (e.g., a hard disk 1270, a read-only memory (ROM) 1230, arandom-access memory (RAM) 1240), various data files applicable tocomputer processing and/or communication, and some program instructionsexecuted possibly by the processor 1220. The computing device 1200 mayalso include an I/O device 1260 that may support the input and output ofdata flows between computing device 1200 and other components. Moreover,computing device 1200 may receive programs and data via thecommunication network. In some embodiments, the computing device 1200may further include a cache memory (not shown in FIG. 12) incommunication with the processor 1220. In some embodiments, the cachememory may be integrated into the processor 1220 or the RAM 1240.

FIG. 13 is a schematic diagram illustrating exemplary hardware and/orsoftware components of an exemplary mobile device on which a terminal1130 may be implemented according to some embodiments of the presentdisclosure. As illustrated in FIG. 13, the mobile device 1300 mayinclude a communication platform 1310, a display 1320, a graphicprocessing unit (GPU) 1330, a central processing unit (CPU) 1340, an I/O1350, a memory 1360, a mobile operating system (OS) 1370, one or moreapplications 1380, and a storage 1390. In some embodiments, any othersuitable component, including but not limited to a system bus or acontroller (not shown), may also be included in the mobile device 1300.In some embodiments, a mobile operating system 1370 (e.g., iOS™,Android™, Windows Phone™, etc.) and one or more applications 1380 may beloaded into the memory 1360 from the storage 1390 in order to beexecuted by the CPU 1340. The applications 1380 may include a browser orany other suitable mobile apps for receiving and rendering informationrelating to data processing or other information from the dataprocessing system 1100. User interactions with the information streammay be achieved via the I/O 1350 and provided to the storage device1140, the server 1110 and/or other components of the data processingsystem 1100. In some embodiments, the mobile device 1300 may furtherinclude a cache memory in communication with the CPU 1340 and thestorage 1390. In some embodiments, the cache memory may be integratedinto the CPU 1340 or the memory 1360.

To implement various modules, units, and their functionalities describedin the present disclosure, computer hardware platforms may be used asthe hardware platform(s) for one or more of the elements describedherein. A computer with user interface elements may be used to implementa personal computer (PC) or any other type of work station or terminaldevice. A computer may also act as a system if appropriately programmed.

Having thus described the basic concepts, it may be rather apparent tothose skilled in the art after reading this detailed disclosure that theforegoing detailed disclosure is intended to be presented by way ofexample only and is not limiting. Various alterations, improvements, andmodifications may occur and are intended to those skilled in the art,though not expressly stated herein. These alterations, improvements, andmodifications are intended to be suggested by this disclosure, and arewithin the spirit and scope of the exemplary embodiments of thisdisclosure.

Moreover, certain terminology has been used to describe embodiments ofthe present disclosure. For example, the terms “one embodiment,” “anembodiment,” and/or “some embodiments” mean that a particular feature,structure or characteristic described in connection with the embodimentis included in at least one embodiment of the present disclosure.Therefore, it is emphasized and should be appreciated that two or morereferences to “an embodiment,” “one embodiment,” or “an alternativeembodiment” in various portions of this specification are notnecessarily all referring to the same embodiment. Furthermore, theparticular features, structures or characteristics may be combined assuitable in one or more embodiments of the present disclosure.

Further, it will be appreciated by one skilled in the art, aspects ofthe present disclosure may be illustrated and described herein in any ofa number of patentable classes or context including any new and usefulprocess, machine, manufacture, or composition of matter, or any new anduseful improvement thereof. Accordingly, aspects of the presentdisclosure may be implemented entirely hardware, entirely software(including firmware, resident software, micro-code, etc.) or combiningsoftware and hardware implementation that may all generally be referredto herein as a “block,” “module,” “engine,” “unit,” “component,” or“system.” Furthermore, aspects of the present disclosure may take theform of a computer program product embodied in one or more computerreadable media having computer readable program code embodied thereon.

A computer readable signal medium may include a propagated data signalwith computer readable program code embodied therein, for example, inbaseband or as part of a carrier wave. Such a propagated signal may takeany of a variety of forms, including electro-magnetic, optical, or thelike, or any suitable combination thereof. A computer readable signalmedium may be any computer readable medium that is not a computerreadable storage medium and that may communicate, propagate, ortransport a program for use by or in connection with an instructionexecution system, apparatus, or device. Program code embodied on acomputer readable signal medium may be transmitted using any appropriatemedium, including wireless, wireline, optical fiber cable, RF, or thelike, or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of thepresent disclosure may be written in any combination of one or moreprogramming languages, including an object oriented programming languagesuch as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C++, C#, VB. NET,Python or the like, conventional procedural programming languages, suchas the “C” programming language, Visual Basic, Fortran 1703, Perl, COBOL1702, PHP, ABAP, dynamic programming languages such as Python, Ruby andGroovy, or other programming languages. The program code may executeentirely on the user's computer, partly on the user's computer, as astand-alone software package, partly on the user's computer and partlyon a remote computer or entirely on the remote computer or server. Inthe latter scenario, the remote computer may be connected to the user'scomputer through any type of network, including a local area network(LAN) or a wide area network (WAN), or the connection may be made to anexternal computer (for example, through the Internet using an InternetService Provider) or in a cloud computing environment or offered as aservice such as a software as a service (SaaS).

Furthermore, the recited order of processing elements or sequences, orthe use of numbers, letters, or other designations, therefore, is notintended to limit the claimed processes and methods to any order exceptas may be specified in the claims. Although the above disclosurediscusses through various examples what is currently considered to be avariety of useful embodiments of the disclosure, it is to be understoodthat such detail is solely for that purpose, and that the appendedclaims are not limited to the disclosed embodiments, but, on thecontrary, are intended to cover modifications and equivalentarrangements that are within the spirit and scope of the disclosedembodiments. For example, although the implementation of variouscomponents described above may be embodied in a hardware device, it mayalso be implemented as a software-only solution—e.g., an installation onan existing server or mobile device.

Similarly, it should be appreciated that in the foregoing description ofembodiments of the present disclosure, various features are sometimesgrouped together in a single embodiment, figure, or description thereoffor the purpose of streamlining the disclosure aiding in theunderstanding of one or more of the various embodiments. This method ofdisclosure, however, is not to be interpreted as reflecting an intentionthat the claimed subject matter requires more features than areexpressly recited in each claim. Rather, claimed subject matter may liein less than all features of a single foregoing disclosed embodiment.

We claim:
 1. A system for querying data, comprising: at least onestorage device including a set of instructions and feature informationof a plurality of entities, the feature information including at leastone feature dimension for each entity, and at least one feature valuefor each feature dimension; a cache memory for storing data cached fromthe at least one storage device, wherein the data includes a featurevalue or an empty entry of a selected feature dimension of each entityof the plurality of entities, the empty entry representing acharacteristic value of the selected feature dimension; at least oneprocessor in communication with the at least one storage medium and thecache memory, wherein when executing the set of instructions, the atleast one processor is configured to direct the system to: in responseto a query request related to the plurality of entities, perform a firstsearch in the cache memory to produce a first search result; obtain acorresponding relationship between the characteristic value and theselected feature dimension; and generate a query result of the queryrequest based on the corresponding relationship and the first searchresult.
 2. The system of claim 1, wherein the at least one processor isfurther configured to direct the system to: determine the characteristicvalue of the selected feature dimension among the feature values of theselected feature dimension of the plurality of entities and establishthe corresponding relationship between the characteristic value and theselected feature dimension; for each entity having a feature value ofthe selected feature dimension being unequal to the characteristicvalue, cache the feature value of the selected feature dimension intothe cache memory; and for each entity having a feature value of theselected feature dimension being equal to the characteristic value,leave the corresponding feature value of the selected feature dimensionwithout caching.
 3. The system of claim 2, the at least one processor isfurther configured to direct the system to: update the featureinformation of the plurality of entities in the at least one storagemedium; and determine an updated characteristic value of the selectedfeature dimension based on the updated feature information.
 4. Thesystem of claim 1, wherein the characteristic value of the selectedfeature dimension is a mode of the feature values of the selectedfeature dimension of the plurality of entities.
 5. The system of claim1, wherein to generate the query result of the query request, the atleast one processor is further configured to direct the system to:replace one or more empty returns for the selected feature dimension inthe first search result with the characteristic value.
 6. The system ofclaim 1, wherein to generate the query result of the query request, theat least one processor is further configured to direct the system to:determine whether the first search result includes an empty return; inresponse to a determination that the first search result includes anempty return, cache the characteristic value of the selected featuredimension into the cache memory based on the corresponding relationshipfor each entity whose selected feature dimension having an empty entry;perform a second search in the cache memory in response to the queryrequest to produce a second search result; and generate the query resultof the query request based on the second search result.
 7. The system ofclaim 1, wherein to perform the first search in the cache memory inresponse to the query request, the at least one processor is furtherconfigured to direct the system to: determine whether the query requestis related to the selected feature dimension and the correspondingcharacteristic value; in response to a determination that the queryrequest is related to the selected feature dimension and thecorresponding characteristic value, update the query request, theupdated query request including the feature dimension and an emptyentry; and perform the first search in the cache memory based on theupdated query request.
 8. The system of claim 1, wherein to perform thefirst search in the cache memory in response to the query request, theat least one processor is further configured to direct the system to:determine whether the query request is related to the selected featuredimension; in response to a determination that the query request isrelated to the selected feature dimension, cache the characteristicvalue of the selected feature dimension into the cache memory for eachentity whose selected feature dimension has an empty entry; and performthe first search in the cache memory in response to the query request.9. The system of claim 1, wherein to perform the first search in thecache memory in response to the query request, the at least oneprocessor is further configured to direct the system to: in response tothe query request, cache the characteristic value of the selectedfeature dimension into the cache memory for each entity whose selectedfeature dimension has an empty entry; and perform the first search inthe cache memory.
 10. The system of claim 1, wherein the plurality ofentities include at least one of service requesters, service providers,or service orders in an Online to Offline (020) service system.
 11. Amethod implemented on a computing device having at least one processor,at least one storage device, a cache memory, and a communicationplatform connected to a network, wherein the at least one storage deviceincludes feature information of a plurality of entities, the featureinformation includes at least one feature dimension for each entity, andat least one feature value for each feature dimension, the cache memorystores data cached from the at least one storage device, the dataincludes a feature value or an empty entry of a selected featuredimension of each entity of the plurality of entities, the empty entryrepresents a characteristic value of the selected feature dimension, andthe method comprises: in response to a query request related to theplurality of entities, performing a first search in the cache memory toproduce a first search result; obtaining a corresponding relationshipbetween the characteristic value and the selected feature dimension; andgenerating a query result of the query request based on thecorresponding relationship and the first search result.
 12. The methodof claim 11, further comprising: determining the characteristic value ofthe selected feature dimension among the feature values of the selectedfeature dimension of the plurality of entities and establish thecorresponding relationship between the characteristic value and theselected feature dimension; for each entity having a feature value ofthe selected feature dimension being unequal to the characteristicvalue, caching the corresponding feature value of the selected featuredimension into the cache memory; and for each entity having a featurevalue of the selected feature dimension being equal to thecharacteristic value, leaving the feature value of the selected featuredimension without caching.
 13. The method of claim 12, furthercomprising: updating the feature information of the plurality ofentities in the at least one storage medium; and determining an updatedcharacteristic value of the selected feature dimension based on theupdated feature information.
 14. The method of claim 11, wherein thecharacteristic value of the selected feature dimension is a mode of thefeature values of the selected feature dimension of the plurality ofentities.
 15. The method of claim 11, wherein the generating the queryresult of the query request comprises: replacing one or more emptyreturns for the selected feature dimension in the first search resultwith the characteristic value.
 16. The method of claim 11, wherein thegenerating the query result of the query request comprises: determiningwhether the first search result includes an empty return; in response toa determination that the first search result includes an empty return,caching the characteristic value of the selected feature dimension intothe cache memory based on the corresponding relationship for each entitywhose selected feature dimension having an empty entry; performing asecond search in the cache memory in response to the query request toproduce a second search result; and generating the query result of thequery request based on the second search result.
 17. The method of claim11, wherein the performing the first search in the cache memory inresponse to the query request comprises: determining whether the queryrequest is related to the selected feature dimension and thecorresponding characteristic value; in response to a determination thatthe query request is related to the selected feature dimension and thecorresponding characteristic value, updating the query request, theupdated query request including the feature dimension and an emptyentry; and performing the first search in the cache memory based on theupdated query request.
 18. The method of claim 11, wherein theperforming the first search in the cache memory in response to the queryrequest comprises: determining whether the query request is related tothe selected feature dimension; in response to a determination that thequery request is related to the selected feature dimension, caching thecharacteristic value of the selected feature dimension into the cachememory for each entity whose selected feature dimension has an emptyentry; and performing the first search in the cache memory in responseto the query request.
 19. The method of claim 11, wherein the performingthe first search in the cache memory in response to the query requestcomprises: in response to the query request, caching the characteristicvalue of the selected feature dimension into the cache memory for eachentity whose selected feature dimension has an empty entry; andperforming the first search in the cache memory.
 20. A non-transitorycomputer readable medium, comprising a set of instructions and featureinformation of a plurality of entities, the feature informationincluding at least one feature dimension for each entity, and at leastone feature value for each feature dimension, wherein when executed byat least one processor, the set of instructions direct the at least oneprocessor to effectuate a method, the method comprising: in response toa query request related to the plurality of entities, performing a firstsearch in a cache memory to produce a first search result, wherein thecache memory stores data cached from the at least one non-transitorycomputer readable medium, the data includes a feature value or an emptyentry of a selected feature dimension of each entity of the plurality ofentities, and the empty entry represents a characteristic value of theselected feature dimension; obtain a corresponding relationship betweenthe characteristic value and the selected feature dimension; andgenerate a query result of the query request based on the correspondingrelationship and the first search result.